**[holger--rebol--com]**

# Question about checksum/secure

### [1/8] from: brett::codeconscious::com at: 19-Nov-2001 16:06

I hope this isn't a silly question.
Can the result of checksum/secure be used like a GUID, is it guaranteed to
be unique?
Brett.

### [2/8] from: lmecir:mbox:vol:cz at: 19-Nov-2001 9:26

Hi Brett,
I hope this isn't a silly question.
Can the result of checksum/secure be used like a GUID, is it guaranteed to
be unique?
Brett.
For the checksum to be secure the probability of collisions must be very
low, but it is not equal to zero, of course.

### [3/8] from: holger:rebol at: 19-Nov-2001 6:20

On Mon, Nov 19, 2001 at 04:06:06PM +1100, Brett Handley wrote:

> I hope this isn't a silly question.
>
> Can the result of checksum/secure be used like a GUID, is it guaranteed to
> be unique?

No, hashing is a many-to-one operation, mapping from an infinite space
(arbitrarily long string series) to a finite space (string series of
length 20), so there always has to be the chance of collisions.
The chance of a collision is very small though, especially if you
deal with a small number of values, say, less than a billion billion
billion :-). For any two arbitrary strings to have the same checksum
the chance is 1/(2^160). For at least one collision among n strings
the chance is n/(2^160) (for n <= 2^160, obviously).
--
Holger Kruse

### [4/8] from: brett:codeconscious at: 20-Nov-2001 1:49

Thanks Holger and Ladislav,

> No, hashing is a many-to-one operation, mapping from an infinite space
> (arbitrarily long string series) to a finite space (string series of

<<quoted lines omitted: 4>>

> the chance is 1/(2^160). For at least one collision among n strings
> the chance is n/(2^160) (for n <= 2^160, obviously).
Hm a billion billion billion, guess it will be a while before my hardward
threatens that, however I suppose some rarely used collision detection is in
order.
Next (and hopefully last) question then is, is the checksum/secure method
something that could change?
I was thinking of using it for example to generate a filename by doing
checksum/secure on the content of a text file, but I think the question is
useful for understanding encryption keys as well.
Many thanks.
Brett.

### [5/8] from: lmecir:mbox:vol:cz at: 19-Nov-2001 17:38

Hi Holger,
<<Brett>>

> I hope this isn't a silly question.
>
> Can the result of checksum/secure be used like a GUID, is it guaranteed to
> be unique?

<</Brett>>
<<Holger>>
No, hashing is a many-to-one operation, mapping from an infinite space
(arbitrarily long string series) to a finite space (string series of
length 20), so there always has to be the chance of collisions.
The chance of a collision is very small though, especially if you
deal with a small number of values, say, less than a billion billion
billion :-). For any two arbitrary strings to have the same checksum
the chance is 1/(2^160). For at least one collision among n strings
the chance is n/(2^160) (for n <= 2^160, obviously).
--
Holger Kruse

**[holger--rebol--com]**<</Holger>> Your second formula is incorrect for any n except for (n = 2 ** 160). Especially it is incorrect for (n = 2), where it yields twice as much as it should. OTOH for higher n, (2 < n) and (n < (2 ** 160)), the probability of a collision may be much higher than your formula yields. The correct formula should be: m: 2 ** 160 q: 1 * (1 - (1 / m)) * (1 - (2 / m)) * ... * (1 - (n - 1 / m)) p: 1 - q , where Q is the probability that no collision occurs.### [6/8] from: holger:rebol at: 19-Nov-2001 10:58

On Mon, Nov 19, 2001 at 05:38:15PM +0100, Ladislav Mecir wrote:

> Your second formula is incorrect for any n except for (n = 2 ** 160).
> Especially it is incorrect for (n = 2), where it yields twice as much as it

<<quoted lines omitted: 5>>

> p: 1 - q
> , where Q is the probability that no collision occurs.
Thanks, you are right. I was thinking about having m-i in the denominator, so
most parts would conveniently cancel out, but it is really m all the time...
So the chance for a collision among n elements seems to be
1 - (m! / ((m-n)! * m^n))
--
Holger Kruse

**[holger--rebol--com]**### [7/8] from: lmecir:mbox:vol:cz at: 19-Nov-2001 22:35

Hi,
(...) The correct

> formula should be:
>
> m: 2 ** 160
> q: 1 * (1 - (1 / m)) * (1 - (2 / m)) * ... * (1 - (n - 1 / m))
> p: 1 - q
>
> , where Q is the probability that no collision occurs.

(...)
So the chance for a collision among n elements seems to be
1 - (m! / ((m-n)! * m^n))
--
Holger Kruse

**[holger--rebol--com]**If I use the formula correctly, the probability of a collision in a set of 1 M samples is somewhere around 2 ** -120.### [8/8] from: brett:codeconscious at: 20-Nov-2001 13:44

Thanks heaps for the previous responses.
Is the checksum/secure method something that could change?
I was thinking of using it for example to generate a filename by doing
checksum/secure on the content of a text file, but I think the question is
useful for understanding encryption keys as well.
Many thanks.
Brett.

Notes

- Quoted lines have been omitted from some messages.

View the message alone to see the lines that have been omitted