# Birthday Attack Calculations

## A Ciphers By Ritter Page

How can we relate the number of elements in a population and the number of random samples needed before we expect to find a duplicate or match?

### Contents

```
Subject: Birthday Attack calculations.
Date: Wed, 30 Dec 1998 23:16:02 GMT
From: fred_vanandel@bigfoot.com (Fred Van Andel)
Message-ID: <368ab32b.57279906@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 7

How does one calculate the exact number of documents that need to be
hashed to ensure a collision?  I know that it is approximately
1.2*2^(M/2), but what is the exact formula or procedure for
calculating the number?

Thanks
Fred Van Andel

Subject: Re: Birthday Attack calculations.
Date: 30 Dec 1998 23:59:16 GMT
From: djohn37050@aol.com (DJohn37050)
Message-ID: <19981230185916.28531.00004724@ng39.aol.com>
References: <368ab32b.57279906@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 3

To ensure a collision, the exact number is n+1.  THe birthday attack in
probabilistic.  See the HAC.

Subject: Re: Birthday Attack calculations.
Date: Thu, 31 Dec 1998 03:46:01 GMT
From: fred_vanandel@bigfoot.com (Fred Van Andel)
Message-ID: <368af32e.73668929@news.intergate.bc.ca>
References: <19981230185916.28531.00004724@ng39.aol.com>
Newsgroups: sci.crypt
Lines: 10

djohn37050@aol.com (DJohn37050) wrote:

>To ensure a collision, the exact number is n+1.  THe birthday attack in
>probabilistic.  See the HAC.

I should have stated for 50% probibality, the formula given is for
that value.  My aploogies.

Fred Van Andel

Subject: Re: Birthday Attack calculations.
Date: 31 Dec 98 04:55:33 +0000
Message-ID: <563.669T1659T2953896@mistral.co.uk>
References: <368af32e.73668929@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 28

On 31-Dec-98 03:46:01, Fred Van Andel said:

>>To ensure a collision, the exact number is n+1.  THe birthday attack in
>>probabilistic.  See the HAC.

>I should have stated for 50% probibality, the formula given is for
>that value.  My aploogies.

well, doing it "by hand" in perl would look something like:

\$n=43949268;
#\$n=365; #gives 23, which is right

\$p=1;

for(\$i=1;\$p>0.5;\$i++) {
\$p*=(\$n-\$i)/\$n;
}
print \$i."\n";

--
Verbing weirds language. (Calvin)

Subject: Re: Birthday Attack calculations.
Date: Sat, 2 Jan 1999 14:41:56 +0000
From: David Broughton <David@ddina.demon.co.uk>
Message-ID: <\$7zENAA0Ajj2EwNz@ddina.demon.co.uk>
References: <368ab32b.57279906@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 36

In article <368ab32b.57279906@news.intergate.bc.ca>, Fred Van Andel
<fred_vanandel@bigfoot.com> writes

> How does one calculate the exact number of documents that need to
> be hashed to ensure a collision?  I know that it is approximately
> 1.2*2^(M/2), but what is the exact formula or procedure for
> calculating the number?

Try this formula:

w = n^g + 0.29 - e

where:
w = the number of documents to get 50% probability of a collision
n = number of different hash codes, all equiprobable
g = 0.5 + 1/(6.13 * ln(n))
ln() is the natural logarithm function
e is a small error that can be ignored in practice, usually < +- 1.

This is an empirical formula that I worked out many years ago and
filed away in a notebook.

The exact formula is the value of w in this equation:

( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5

but this is not a practical calculation for large n.

As you can see, w is a bit larger than the square root of n.  For
n = 10^6, for example, w = 1177.68 (e = -0.197).

If your formula is meant to be 1.2 * n^0.5, then w = 1200 for n =
10^6 which is getting there.

--
David Broughton

Subject: Re: Birthday Attack calculations.
Date: Sat, 2 Jan 1999 19:11:25 GMT
From: ppearson@netcom.com (Peter Pearson)
Message-ID: <ppearsonF4y5B1.Jyv@netcom.com>
References: <\$7zENAA0Ajj2EwNz@ddina.demon.co.uk>
Newsgroups: sci.crypt
Lines: 38

In article <\$7zENAA0Ajj2EwNz@ddina.demon.co.uk>,
David Broughton  <David@ddina.demon.co.uk> wrote:
>In article <368ab32b.57279906@news.intergate.bc.ca>, Fred Van Andel
><fred_vanandel@bigfoot.com> writes
>
>> How does one calculate the exact number of documents that need to
>> be hashed to ensure a collision?  I know that it is approximately
>> 1.2*2^(M/2), but what is the exact formula or procedure for
>> calculating the number?
>
> [snip]
>
>The exact formula is the value of w in this equation:
>
>   ( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5
>
>but this is not a practical calculation for large n.

To derive your own approximation formula from the above, exact
formula, rewrite it as follows:

n (n-1) (n-2) ... (n+1-w)      n!
product = -------------------------- = ----------
n^w                (n-w)! n^w

Then, use Stirling's approximation to make the factorials more
manageable. Stirling's approximation (see, for example, Knuth,
Art of Computer Programming, Volume 1) is:

ln( n! ) = (n+1/2) ln(n) - n + ln( 2*pi )/2 + 1/(12*n) - ...

You'll have to experiment with the number of terms required to
get meaningful results. Overimplifying to n*ln(n)-n gives conspicuously
nonsensical results. If memory serves, (n+1/2) ln(n) - n + ln( 2*pi )/2
is a good level of approximation, and one needs the approximation
ln(1+x) = x (for small x) as well.

- Peter

Subject: Re: Birthday Attack calculations.
Date: Sat, 02 Jan 1999 22:46:05 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <368ea19e.8344667@news.io.com>
References: <368ab32b.57279906@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 40

On Wed, 30 Dec 1998 23:16:02 GMT, in
<368ab32b.57279906@news.intergate.bc.ca>, in sci.crypt
fred_vanandel@bigfoot.com (Fred Van Andel) wrote:

>How does one calculate the exact number of documents that need to be
>hashed to ensure a collision?  I know that it is approximately
>1.2*2^(M/2), but what is the exact formula or procedure for
>calculating the number?

>[I should have stated for 50% probibality, the formula given is for
>that value.]

I like:

s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2

where s is the expected number of samples needed, N the size of the
population being sampled, and p the given probability.

For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead of
the usual handwave SQRT(N) = 1000, or the stated approximation, 1.2 *
SQRT(N) = 1200.

For N = 2**20 and p = 0.5, we get 1206.2 instead of 1.2 * 1024 =
1228.8 for the stated approximation.

The formula basically comes out of my article on population
estimation:

http://www.io.com/~ritter/ARTS/BIRTHDAY.HTM

It might be interesting to know of an application where this sort of
precision is useful.

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM

Subject: Re: Birthday Attack calculations.
Date: Wed, 06 Jan 1999 06:01:50 GMT
From: fred_vanandel@bigfoot.com (Fred Van Andel)
Message-ID: <3693fa55.17551686@news.intergate.bc.ca>
References: <368ea19e.8344667@news.io.com>
Newsgroups: sci.crypt
Lines: 17

I would like to thank those who spent time in answering my question.
It has proven to be very helpful.

Terry Ritter wrote:
>It might be interesting to know of an application where this sort of
>precision is useful.

The reason for the needed precision is that I am designing a  variable
length hash function and I want to test it for resistance to
collision.  Since I don't have the resources to do a full test  for a
256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes
and search for trends. If the algorithm is resistant to collision in
the smaller sizes and there is no trend away from the "proper" value
then due to the nature of the algorithm I can be quite confidant that
the larger hashes are also resistant to collisions.

Fred Van Andel

Subject: Re: Birthday Attack calculations.
Date: Wed, 06 Jan 1999 12:54:24 GMT
From: dscott@networkusa.net
Message-ID: <76vme0\$avl\$1@nnrp1.dejanews.com>
References: <3693fa55.17551686@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 36

In article <3693fa55.17551686@news.intergate.bc.ca>,
fred_vanandel@bigfoot.com wrote:
> I would like to thank those who spent time in answering my question.
> It has proven to be very helpful.
>
>  Terry Ritter wrote:
> >It might be interesting to know of an application where this sort of
> >precision is useful.
>
> The reason for the needed precision is that I am designing a  variable
> length hash function and I want to test it for resistance to
> collision.  Since I don't have the resources to do a full test  for a
> 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes
> and search for trends. If the algorithm is resistant to collision in
> the smaller sizes and there is no trend away from the "proper" value
> then due to the nature of the algorithm I can be quite confidant that
> the larger hashes are also resistant to collisions.
>
> Fred Van Andel
>

This is a very good idea. Since only if hash is any good will it
have the distribution predicticed by the bithday collision method.
However funny things do happen and though it is a very god idea
to check at the small lengths where more statisitics can be made.
You should always run tests on the final lenght your going to use
or you might get surprised.

david scott

--
http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip
http://members.xoom.com/ecil/index.htm

-----------== Posted via Deja News, The Discussion Network ==----------

Subject: Re: Birthday Attack calculations.
Date: Thu, 07 Jan 1999 03:53:13 GMT
From: fred_vanandel@bigfoot.com (Fred Van Andel)
Message-ID: <36942dfe.1363360@news.intergate.bc.ca>
References: <76vme0\$avl\$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 32

>  fred_vanandel@bigfoot.com wrote:

> The reason for the needed precision is that I am designing a  variable
> length hash function and I want to test it for resistance to
> collision.  Since I don't have the resources to do a full test  for a
> 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes
> and search for trends. If the algorithm is resistant to collision in
> the smaller sizes and there is no trend away from the "proper" value
> then due to the nature of the algorithm I can be quite confidant that
> the larger hashes are also resistant to collisions.

dscott@networkusa.net replied:

> This is a very good idea. Since only if hash is any good will it
>have the distribution predicticed by the bithday collision method.
>However funny things do happen and though it is a very god idea
>to check at the small lengths where more statisitics can be made.
>You should always run tests on the final lenght your going to use
>or you might get surprised.

The whole point of testing on small hashes and extrapolating is that
is computationally impossible to do a birthday attack on a large hash.
For a 256 bit hash you will need to create more than  2^128 hashes
before the odds of a collision reach 50%.

Do You know how long that will take on my 486-66. Or even a planet
full of computers for that matter. The indirect evidence is the ONLY
indication of collision resistance.

Fred Van Andel

Subject: Re: Birthday Attack calculations.
Date: Thu, 07 Jan 1999 12:28:17 GMT
From: dscott@networkusa.net
Message-ID: <772990\$kl5\$1@nnrp1.dejanews.com>
References: <36942dfe.1363360@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 59

In article <36942dfe.1363360@news.intergate.bc.ca>,
fred_vanandel@bigfoot.com wrote:
> >  fred_vanandel@bigfoot.com wrote:
>
> > The reason for the needed precision is that I am designing a  variable
> > length hash function and I want to test it for resistance to
> > collision.  Since I don't have the resources to do a full test  for a
> > 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes
> > and search for trends. If the algorithm is resistant to collision in
> > the smaller sizes and there is no trend away from the "proper" value
> > then due to the nature of the algorithm I can be quite confidant that
> > the larger hashes are also resistant to collisions.
>
> dscott@networkusa.net replied:
>
> > This is a very good idea. Since only if hash is any good will it
> >have the distribution predicticed by the bithday collision method.
> >However funny things do happen and though it is a very god idea
> >to check at the small lengths where more statisitics can be made.
> >You should always run tests on the final lenght your going to use
> >or you might get surprised.
>
> The whole point of testing on small hashes and extrapolating is that
> is computationally impossible to do a birthday attack on a large hash.
> For a 256 bit hash you will need to create more than  2^128 hashes
> before the odds of a collision reach 50%.
>
> Do You know how long that will take on my 486-66. Or even a planet
> full of computers for that matter. The indirect evidence is the ONLY
> indication of collision resistance.
>
> Fred Van Andel
>
>

Again you should run tests on the final length. I did not say
to run the number necessiary to get a 50% collision. It is more
than obvious you can't run that number of tests at your full
length. However it would be stupid not to run some tests in hopes
that no collision occur at long length. Since if some appear at
all then something is wrong. So if it passes the short lengths
test still are needed for finail length.
The indeirect evidence is NOT the ONLY indication you should
unless to irigant try the long case in hopes of no errors.

David Scott

P.S. Example I test scott16u very much. But I still had to
do a few tests on scott19u. And the upscaling produced a few
bugs that I would not have found and fixed if I smuggly
ignored testing all together on the longer version.

--
http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip
http://members.xoom.com/ecil/index.htm

-----------== Posted via Deja News, The Discussion Network ==----------

Subject: Re: Birthday Attack calculations.
Date: Sat, 09 Jan 1999 00:14:38 GMT
From: dscott@networkusa.net
Message-ID: <77671c\$3d2\$1@nnrp1.dejanews.com>
References: <775qop\$cbb@qualcomm.com>
<36942dfe.1363360@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 60

In article <775qop\$cbb@qualcomm.com>,
ggr@qualcomm.com (Gregory G Rose) wrote:
> In article <36942dfe.1363360@news.intergate.bc.ca>,
> Fred Van Andel <fred_vanandel@bigfoot.com> wrote:
> <dscott@networkusa.net replied:
> <> This is a very good idea. Since only if hash is any good will it
> <>have the distribution predicticed by the bithday collision method.
> <>However funny things do happen and though it is a very god idea
> <>to check at the small lengths where more statisitics can be made.
> <>You should always run tests on the final lenght your going to use
> <>or you might get surprised.
> <
> <The whole point of testing on small hashes and extrapolating is that
> <is computationally impossible to do a birthday attack on a large hash.
> <For a 256 bit hash you will need to create more than  2^128 hashes
> <before the odds of a collision reach 50%.
> <
> <Do You know how long that will take on my 486-66. Or even a planet
> <full of computers for that matter. The indirect evidence is the ONLY
> <indication of collision resistance.
>
> Now, let's not get testy. When David has a good
> idea, we should encourage it. The way I
> interpreted his comment was that, instead of
> testing the cut-down algorithms, cut down the
> output of the real algorithm. Generate your
> 256-bit hashes, split them into 32- or 64- bit
> chunks, and test *those* as if they had come from
> a smaller generator. Any collision problems in the
> larger hash, or any implementation glitch in
> scaling it up, should show up.
>
> regards,
> Greg.
> --

NO it is not even close to my thoughts

I thought it was a very good reply on my part sorry
you where not capable of understanding it. So here
it is in different words. It was previously stated
that a cut down version of method tested. Apparently
short enough to be able get meaning ful data based
on probability of collisions. But the thing was
worded that he should never test the FULL LENGTH
case since the collision probabilty was virtually
ZERO for any real number of tries one could make
with the FULL LENGTH caae. But if the FULL LENGTH
case is what one is programming the stupid hash
function for. It would be FUCKING stupid not to run
a few cases just to be DAM SURE non occurred. People
can fuck up code you know.

David Scott

http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip
http://members.xoom.com/ecil/index.htm

-----------== Posted via Deja News, The Discussion Network ==----------

Subject: Re: Birthday Attack calculations.
Date: Sat, 09 Jan 1999 05:44:15 GMT
From: fred_vanandel@bigfoot.com (Fred Van Andel)
Message-ID: <3698df07.97942674@news.intergate.bc.ca>
References: <77671c\$3d2\$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 61

dscott@networkusa.net wrote:
>  ggr@qualcomm.com (Gregory G Rose) wrote:

>> Now, let's not get testy. When David has a good
>> idea, we should encourage it. The way I
>> interpreted his comment was that, instead of
>> testing the cut-down algorithms, cut down the
>> output of the real algorithm. Generate your
>> 256-bit hashes, split them into 32- or 64- bit
>> chunks, and test *those* as if they had come from
>> a smaller generator. Any collision problems in the
>> larger hash, or any implementation glitch in
>> scaling it up, should show up.
>>
>> regards,
>> Greg.
>> --
>
>NO it is not even close to my thoughts
>
>  I thought it was a very good reply on my part sorry
>you where not capable of understanding it. So here
>it is in different words. It was previously stated
>that a cut down version of method tested. Apparently
>short enough to be able get meaning ful data based
>on probability of collisions. But the thing was
>worded that he should never test the FULL LENGTH
>case since the collision probabilty was virtually
>ZERO for any real number of tries one could make
>with the FULL LENGTH caae. But if the FULL LENGTH
>case is what one is programming the stupid hash
>function for. It would be FUCKING stupid not to run
>a few cases just to be DAM SURE non occurred. People
>can fuck up code you know.
>
>David Scott
>
Unfortunately running one test of a 256 bit hash for even a tiny part
of its range will require far more disk space than I have access to
and will have virtually zero chance of finding a collision.  Even
testing a mere 2^32 hashes compared with the > 2^64 required for a 50%
collision of a 256 bit hash would require 128 GigaBytes of disk space
to store all of the calculated hashes. I only wish that I had that
much.

I will carry the testing into as far as I can go and still generate
statistically significant numbers. There is no point expending all my
computing power trying to force a collision of one relatively large
hash because the result would be a matter of dumb luck and completely
meaningless.  But by testing millions of tiny hashes and tens of
thousands of 40 or 48 bit hashes I can generate some meaningful
numbers.

If the hash is so fatally flawed that I could routinely find a
collision at much lower than the normal values then I would have found
this out at the smaller hash sizes.  The algorithm is scalable, there
is no real difference between a small hash and a large hash except for
the obvious.  Therefore I expect that the properties of the smaller
sizes to carry into the larger sizes.

Fred van Andel

Subject: Re: Birthday Attack calculations.
Date: Sat, 09 Jan 1999 13:40:58 GMT
From: dscott@networkusa.net
Message-ID: <777m9a\$985\$1@nnrp1.dejanews.com>
References: <3698df07.97942674@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 81

In article <3698df07.97942674@news.intergate.bc.ca>,
fred_vanandel@bigfoot.com wrote:
> dscott@networkusa.net wrote:

snip...

> >NO it is not even close to my thoughts
> >
> >  I thought it was a very good reply on my part sorry
> >you where not capable of understanding it. So here
> >it is in different words. It was previously stated
> >that a cut down version of method tested. Apparently
> >short enough to be able get meaning ful data based
> >on probability of collisions. But the thing was
> >worded that he should never test the FULL LENGTH
> >case since the collision probabilty was virtually
> >ZERO for any real number of tries one could make
> >with the FULL LENGTH caae. But if the FULL LENGTH
> >case is what one is programming the stupid hash
> >function for. It would be FUCKING stupid not to run
> >a few cases just to be DAM SURE non occurred. People
> >can fuck up code you know.
> >
> >David Scott
> >
> Unfortunately running one test of a 256 bit hash for even a tiny part
> of its range will require far more disk space than I have access to
> and will have virtually zero chance of finding a collision.  Even
> testing a mere 2^32 hashes compared with the > 2^64 required for a 50%
> collision of a 256 bit hash would require 128 GigaBytes of disk space
> to store all of the calculated hashes. I only wish that I had that
> much.
>
testing. I you can't run 100 cases your method sucks.
if you get any collisions at all in the 100 cases then you
FUCKED up the CODE. But you MUST check the code for some
cases that you actually suspect the code to run on.
It reminds me of work when we bought code that was to run
on a DEC machine using VT100 terminals. THE CRAP  didn't work
we talked to the company on phone several times. They claimed
it was developed on DEC machines using VT100 terminals after
months when one of there representatives came out to our site
after several visits they brougth one of there VT100 termainals
the CRAP ran. But they had used a VT100 clone there code
would not work on a real VT100 DEC terminal. I think the
company died I sure hope so. But you have to run REAL TESTS
on the real one you expect the code to be used. Even if your
a pompous asshole and irrigantly expect no problem.

> I will carry the testing into as far as I can go and still generate
> statistically significant numbers. There is no point expending all my
> computing power trying to force a collision of one relatively large
> hash because the result would be a matter of dumb luck and completely
> meaningless.  But by testing millions of tiny hashes and tens of
> thousands of 40 or 48 bit hashes I can generate some meaningful
> numbers.
>
> If the hash is so fatally flawed that I could routinely find a
> collision at much lower than the normal values then I would have found
> this out at the smaller hash sizes.  The algorithm is scalable, there
> is no real difference between a small hash and a large hash except for
> the obvious.  Therefore I expect that the properties of the smaller
> sizes to carry into the larger sizes.
>

You can expect all you want. Yor should more like a manager than
a real programmer with much common sense.

> Fred van Andel
>

David Scott

http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip
http://members.xoom.com/ecil/index.htm

-----------== Posted via Deja News, The Discussion Network ==----------

Subject: Re: Birthday Attack calculations.
Date: Sun, 10 Jan 1999 11:49:10 GMT
From: dscott@networkusa.net
Message-ID: <77a43m\$7vu\$1@nnrp1.dejanews.com>
References: <369913dc.21716751@news.intergate.bc.ca>
<777m9a\$985\$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 59

In article <369913dc.21716751@news.intergate.bc.ca>,
fred_vanandel@bigfoot.com wrote:
> dscott@networkusa.net wrote:
>
> >In article <3698df07.97942674@news.intergate.bc.ca>,
> >  fred_vanandel@bigfoot.com wrote:
> >> dscott@networkusa.net wrote:
> >
> > snip...
> >
> >
> >    Then run 100 cases after your done with your low level
> >testing. I you can't run 100 cases your method sucks.
> >if you get any collisions at all in the 100 cases then you
> >FUCKED up the CODE. But you MUST check the code for some
> >cases that you actually suspect the code to run on.
> >  It reminds me of work when we bought code that was to run
> >on a DEC machine using VT100 terminals. THE CRAP  didn't work
> >we talked to the company on phone several times. They claimed
> >it was developed on DEC machines using VT100 terminals after
> >months when one of there representatives came out to our site
> >after several visits they brougth one of there VT100 termainals
> >the CRAP ran. But they had used a VT100 clone there code
> >would not work on a real VT100 DEC terminal. I think the
> >company died I sure hope so. But you have to run REAL TESTS
> >on the real one you expect the code to be used. Even if your
> >a pompous asshole and irrigantly expect no problem.
> >
> There is not enough computing power on the planet  to do a full
> collision test on a 256 bit hash.  Do the math yourself.
> >
> >  You can expect all you want. Yor should more like a manager than
> >a real programmer with much common sense.
>
> I think there is an insult in there somewhere, I guess that means I
> have arrived.
>

Actual I am disapointed that some one else caught your mathematical
erros while I was just wasting my time telling you over and over
that a FULL collision test on your 256 bit hash was not needed.
But you do need to try a few cases just to verify that ZERO occur
this makes you think you have arrived then go ahead your arrived
just like mr H.

> Fred Van Andel
>

David Scott
P.S. I never did trust the stability of a Van compared to
a Car.

http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip
http://members.xoom.com/ecil/index.htm

-----------== Posted via Deja News, The Discussion Network ==----------

Subject: Re: Birthday Attack calculations.
Date: Sat, 09 Jan 1999 04:46:16 GMT
From: fred_vanandel@bigfoot.com (Fred Van Andel)
Message-ID: <3696dcdf.97390149@news.intergate.bc.ca>
References: <775qop\$cbb@qualcomm.com>
<36942dfe.1363360@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 27

ggr@qualcomm.com (Gregory G Rose) wrote:

>Now, let's not get testy. When David has a good
>idea, we should encourage it. The way I
>interpreted his comment was that, instead of
>testing the cut-down algorithms, cut down the
>output of the real algorithm. Generate your
>256-bit hashes, split them into 32- or 64- bit
>chunks, and test *those* as if they had come from
>a smaller generator. Any collision problems in the
>larger hash, or any implementation glitch in
>scaling it up, should show up.

The smaller hash sizes are not from a 'cut-down' algorithm. The nature
of this algorithm is that is can create hashes of any desired size
(actually in 8 bit multiples). Any given size is no more 'natural'
than any other size. If a smaller hash is collision resistant then I
would expect that a larger hash is also collision resistant. The only
real difference is that the small hashes are very easy to test, the
larger one are more difficult.

weekends for testing purposes,  this will allow me to crunch quite a
few samples in search of collisions.

Fred Van Andel

Subject: Re: Birthday Attack calculations.
Date: Sat, 09 Jan 1999 15:20:47 -0500
From: "Trevor Jackson, III" <fullmoon@aspi.net>
Message-ID: <3697BA1F.9FE69B16@aspi.net>
References: <36942dfe.1363360@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 31

> The whole point of testing on small hashes and extrapolating is that
> is computationally impossible to do a birthday attack on a large hash.
> For a 256 bit hash you will need to create more than  2^128 hashes
> before the odds of a collision reach 50%.
>
> Fred Van Andel

Something is wrong with this paragraph.  Either the attack is not a birthday
attack, or the odds of a collision are much higher.  An exhaustive search of
a 2^256 element state space has a 50% chance of a match agains the target of
the search.  This is not a birthday attack.

A birthday attack searches for any matching pair rather than a match against
a target.  Thus a birthday attack searching N states implies generating O(N)
states, but comparisons among N states means O(N^2) comparisons (assuming
the states are not generated in an order that allows comparison against
neighbors only).  Thus for N small the generation cost may dominate the
overall cost of the search but for even a 40 bit key the comparison cost
will dominate due to the 2^79 comparisons required.

The odds of finding a matching pair among N of M states is the product over
0 to N-1 of the expression (M-i)/M.

odds = 1;
for i=0 to N-1
odds  = odds * (M-i)/M

Thus a birthday attack can find a matching pair of elements much faster than
an exhaustive attack can find a match for a chosen target.

Subject: Re: Birthday Attack calculations.
Date: Sun, 10 Jan 1999 08:31:51 -0500
From: "Trevor Jackson, III" <fullmoon@aspi.net>
Message-ID: <3698ABC7.B01FDDD9@aspi.net>
References: <3697c1c9.703028@news.intergate.bc.ca>
<3697BA1F.9FE69B16@aspi.net>
Newsgroups: sci.crypt
Lines: 80

Fred Van Andel wrote:

> "Trevor Jackson, III" <fullmoon@aspi.net> wrote:
>
> >> The whole point of testing on small hashes and extrapolating is that
> >> is computationally impossible to do a birthday attack on a large hash.
> >> For a 256 bit hash you will need to create more than  2^128 hashes
> >> before the odds of a collision reach 50%.
> >>
> >> Fred Van Andel
> >
> >Something is wrong with this paragraph.  Either the attack is not a birthday
> >attack, or the odds of a collision are much higher.  An exhaustive search of
> >a 2^256 element state space has a 50% chance of a match agains the target of
> >the search.  This is not a birthday attack.
> >
> >A birthday attack searches for any matching pair rather than a match against
> >a target.  Thus a birthday attack searching N states implies generating O(N)
> >states, but comparisons among N states means O(N^2) comparisons (assuming
> >the states are not generated in an order that allows comparison against
> >neighbors only).  Thus for N small the generation cost may dominate the
> >overall cost of the search but for even a 40 bit key the comparison cost
> >will dominate due to the 2^79 comparisons required.
> >
> >The odds of finding a matching pair among N of M states is the product over
> >0 to N-1 of the expression (M-i)/M.
> >
> >    odds = 1;
> >    for i=0 to N-1
> >        odds  = odds * (M-i)/M
> >
> >Thus a birthday attack can find a matching pair of elements much faster than
> >an exhaustive attack can find a match for a chosen target.
>
> Lets see if I understand you correctly. First you generate a pair of
> hashes and compare them to each other.  If they don't match then you
> throw them away and generate a new pair. Repeat until you find a
> match.
>

Hardly.  Throwing away samples wastes the effort invested in creating them.  For
the purpose of the reply I assumed that each sample would be stored after
comparison with the previously stored samples.  Thus we need to store N samples,
and compare (N^2)/2 times.

> But by this method each pair only has one chance in N of being a
> collision and therefore would require N/2 total comparisons to reach
> the 50% level.  When we are talking large hashes this becomes a hugh
> number.
>
> Another way is to generate all the hashes up front, sort them and then
> compare.  For example with a 64 bit hash you could generate your 1.2 *
> 2^32 hashes and save them to a file.  Sorting the file will take
> roughly 32 * 2^32 compares for a total of approx 2^37 operations.

> This is much more feasible that the 2^63 operations required when

> testing pairs of data.  You could run approximately 2^26 of the
> sorting tests in the same time as it takes to run one of the pairs
> test.
>
> Bear in mind that this method does not give you the average but rather
> how often you are successful at the 50% level.  This is a slightly
> different number.  If you want the true average you must generate more
> numbers and then determine afterwards where the match occurred.
>
> You refer to 2^79 compares required for a 40 bit hash.  This the
> absolute worst case in which no collisions are generated even after
> all possible hashes have been created.  The odds of reaching the end
> of the hash without a collision are extremely remote. Even for a hash
> as small as 8 bits, the odds of generating all possible hashes without
> a collision is about 1 in 10^110.  For our 40 bit example most of the
> cases will cluster around 20 bits that corresponds to about 2^39
> compares. That is much more manageable.

You are still using the numbers for exhaustive search.  The birthday attack
numbers are much smaller.  As I don't have an arbitrary precision calculator at
hand I'll have to do some work to generate them accurately.  I'll try to generate
them and post here

Subject: Re: Birthday Attack calculations.
Date: Mon, 11 Jan 1999 03:43:01 GMT
From: fred_vanandel@bigfoot.com (Fred Van Andel)
Message-ID: <36996c54.7217058@news.intergate.bc.ca>
References: <3698ABC7.B01FDDD9@aspi.net>
Newsgroups: sci.crypt
Lines: 26

"Trevor Jackson, III" <fullmoon@aspi.net> wrote:

>Hardly.  Throwing away samples wastes the effort invested in creating them.  For
>the purpose of the reply I assumed that each sample would be stored after
>comparison with the previously stored samples.

I obviously misunderstood what you were trying to say and I tailored
my comments around that misunderstanding.   My apologies for
attempting to put words in your mouth.

However I still stand by my original statement.  A birthday attack on
a 256 bit hash would require in excess of 2^128 hashes to be
calculated and stored before the odds of a collision reach 50%.

>You are still using the numbers for exhaustive search.  The birthday attack
>numbers are much smaller.  As I don't have an arbitrary precision calculator at
>hand I'll have to do some work to generate them accurately.  I'll try to generate
>them and post here.

A birthday attack would require > 2^128 calculation while an
exhaustive search would need 2^255.   There is a big difference. When
you are dealing with large hashes even a birthday attack becomes
impossible to do.  Its the birthday attach that dictates the size of
hash required, not the exhaustive search.

Fred Van Andel

Subject: Re: Birthday Attack calculations.
Date: Mon, 11 Jan 1999 22:45:19 -0500
From: "Trevor Jackson, III" <fullmoon@aspi.net>
References: <36996c54.7217058@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 12

Fred Van Andel wrote:

> A birthday attack would require > 2^128 calculation while an
> exhaustive search would need 2^255.   There is a big difference. When
> you are dealing with large hashes even a birthday attack becomes
> impossible to do.  Its the birthday attach that dictates the size of
> hash required, not the exhaustive search.

I agree with your last statement above.  But I an confused by your first statement
above.  Is there a closed-form equation for the figure 2^128 you quoted?  I am only
familiar with the probability series summation.

Subject: Re: Birthday Attack calculations.
Date: Tue, 12 Jan 1999 06:34:34 GMT
From: fred_vanandel@bigfoot.com (Fred Van Andel)
Message-ID: <369ae5c4.103855861@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 97

"Trevor Jackson, III" <fullmoon@aspi.net> wrote:

>Fred Van Andel wrote:
>
>> A birthday attack would require > 2^128 calculation while an
>> exhaustive search would need 2^255.   There is a big difference. When
>> you are dealing with large hashes even a birthday attack becomes
>> impossible to do.  Its the birthday attach that dictates the size of
>> hash required, not the exhaustive search.
>
>I agree with your last statement above.  But I an confused by your first statement
>above.  Is there a closed-form equation for the figure 2^128 you quoted?  I am only
>familiar with the probability series summation.

You quoted the formula yourself in a earlier message

> odds = 1;
>    for i=0 to N-1
>        odds  = odds * (M-i)/M

For any gives value of M the location of the 50% mark will be roughly
the square root of M.  The square root of 2^256 is 2^128.

Unless I am misunderstanding you again.

Some closed forms of the equations were given in the earlier posts of

This method was posted by David Broughton

>    w = n^g + 0.29 - e
>
>where:
>w = the number of documents to get 50% probability of a collision
>n = number of different hash codes, all equiprobable
>g = 0.5 + 1/(6.13 * ln(n))
>ln() is the natural logarithm function
>e is a small error that can be ignored in practice, usually < +- 1.
>
>This is an empirical formula that I worked out many years ago and
>filed away in a notebook.
>
>The exact formula is the value of w in this equation:
>
>   ( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5
>
>but this is not a practical calculation for large n.
>
>As you can see, w is a bit larger than the square root of n.  For
>n = 10^6, for example, w = 1177.68 (e = -0.197).
>
>If your formula is meant to be 1.2 * n^0.5, then w = 1200 for n =
>10^6 which is getting there.

Or this one by Peter Pearson

>To derive your own approximation formula from the above, exact
>formula, rewrite it as follows:
>
>           n (n-1) (n-2) ... (n+1-w)      n!
>product = -------------------------- = ----------
>                    n^w                (n-w)! n^w
>
>Then, use Stirling's approximation to make the factorials more
>manageable. Stirling's approximation (see, for example, Knuth,
>Art of Computer Programming, Volume 1) is:
>
> ln( n! ) = (n+1/2) ln(n) - n + ln( 2*pi )/2 + 1/(12*n) - ...
>
>You'll have to experiment with the number of terms required to
>get meaningful results. Overimplifying to n*ln(n)-n gives conspicuously
>nonsensical results. If memory serves, (n+1/2) ln(n) - n + ln( 2*pi )/2
>is a good level of approximation, and one needs the approximation
>ln(1+x) = x (for small x) as well.

Or this one by Terry Ritter

>   s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2
>
>where s is the expected number of samples needed, N the size of the
>population being sampled, and p the given probability.
>
>For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead of
>the usual handwave SQRT(N) = 1000, or the stated approximation, 1.2 *
>SQRT(N) = 1200.
>
>For N = 2**20 and p = 0.5, we get 1206.2 instead of 1.2 * 1024 =
>1228.8 for the stated approximation.
>
>The formula basically comes out of my article on population
>estimation:
>
>  http://www.io.com/~ritter/ARTS/BIRTHDAY.HTM
>

Fred Van Andel

Subject: Re: Birthday Attack calculations.
Date: Tue, 12 Jan 1999 08:34:48 -0500
From: "Trevor Jackson, III" <fullmoon@aspi.net>
Message-ID: <369B4F77.E076C774@aspi.net>
References: <369ae5c4.103855861@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 112

Thanks for the background exposition.  I think I am slightly confused over the odds of
finding the first collision and the odds of finding no or any  collisions.

For the purposes of the original issue we want the odds of finding the first collision.
I.e., the expected wait.  The series formula I described indicates the probability of
finding no collisions, which also gives (by 1-p) the probability of finding any number
of collisions.

Presumably even odds of finding a single collision should take less work than even odds
of finding all collisions.  Is this correct?

Fred Van Andel wrote:

> "Trevor Jackson, III" <fullmoon@aspi.net> wrote:
>
> >Fred Van Andel wrote:
> >
> >> A birthday attack would require > 2^128 calculation while an
> >> exhaustive search would need 2^255.   There is a big difference. When
> >> you are dealing with large hashes even a birthday attack becomes
> >> impossible to do.  Its the birthday attach that dictates the size of
> >> hash required, not the exhaustive search.
> >
> >I agree with your last statement above.  But I an confused by your first statement
> >above.  Is there a closed-form equation for the figure 2^128 you quoted?  I am only
> >familiar with the probability series summation.
>
> You quoted the formula yourself in a earlier message
>
> > odds = 1;
> >    for i=0 to N-1
> >        odds  = odds * (M-i)/M
>
> For any gives value of M the location of the 50% mark will be roughly
> the square root of M.  The square root of 2^256 is 2^128.
>
> Unless I am misunderstanding you again.
>
> Some closed forms of the equations were given in the earlier posts of
>
> This method was posted by David Broughton
>
> >    w = n^g + 0.29 - e
> >
> >where:
> >w = the number of documents to get 50% probability of a collision
> >n = number of different hash codes, all equiprobable
> >g = 0.5 + 1/(6.13 * ln(n))
> >ln() is the natural logarithm function
> >e is a small error that can be ignored in practice, usually < +- 1.
> >
> >This is an empirical formula that I worked out many years ago and
> >filed away in a notebook.
> >
> >The exact formula is the value of w in this equation:
> >
> >   ( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5
> >
> >but this is not a practical calculation for large n.
> >
> >As you can see, w is a bit larger than the square root of n.  For
> >n = 10^6, for example, w = 1177.68 (e = -0.197).
> >
> >If your formula is meant to be 1.2 * n^0.5, then w = 1200 for n =
> >10^6 which is getting there.
>
> Or this one by Peter Pearson
>
> >To derive your own approximation formula from the above, exact
> >formula, rewrite it as follows:
> >
> >           n (n-1) (n-2) ... (n+1-w)      n!
> >product = -------------------------- = ----------
> >                    n^w                (n-w)! n^w
> >
> >Then, use Stirling's approximation to make the factorials more
> >manageable. Stirling's approximation (see, for example, Knuth,
> >Art of Computer Programming, Volume 1) is:
> >
> > ln( n! ) = (n+1/2) ln(n) - n + ln( 2*pi )/2 + 1/(12*n) - ...
> >
> >You'll have to experiment with the number of terms required to
> >get meaningful results. Overimplifying to n*ln(n)-n gives conspicuously
> >nonsensical results. If memory serves, (n+1/2) ln(n) - n + ln( 2*pi )/2
> >is a good level of approximation, and one needs the approximation
> >ln(1+x) = x (for small x) as well.
>
> Or this one by Terry Ritter
>
> >   s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2
> >
> >where s is the expected number of samples needed, N the size of the
> >population being sampled, and p the given probability.
> >
> >For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead of
> >the usual handwave SQRT(N) = 1000, or the stated approximation, 1.2 *
> >SQRT(N) = 1200.
> >
> >For N = 2**20 and p = 0.5, we get 1206.2 instead of 1.2 * 1024 =
> >1228.8 for the stated approximation.
> >
> >The formula basically comes out of my article on population
> >estimation:
> >
> >  http://www.io.com/~ritter/ARTS/BIRTHDAY.HTM
> >
>
> Fred Van Andel

Subject: Re: Birthday Attack calculations.
Date: Wed, 13 Jan 1999 06:50:51 GMT
From: fred_vanandel@bigfoot.com (Fred Van Andel)
Message-ID: <369d3d9c.8943462@news.intergate.bc.ca>
References: <369B4F77.E076C774@aspi.net>
Newsgroups: sci.crypt
Lines: 27

"Trevor Jackson, III" <fullmoon@aspi.net> wrote:

>Thanks for the background exposition.  I think I am slightly confused over the odds of
>finding the first collision and the odds of finding no or any  collisions.
>
>For the purposes of the original issue we want the odds of finding the first collision.
>I.e., the expected wait.  The series formula I described indicates the probability of
>finding no collisions, which also gives (by 1-p) the probability of finding any number
>of collisions.
>
>Presumably even odds of finding a single collision should take less work than even odds
>of finding all collisions.  Is this correct?

Using the equasion below the AVERAGE number of hashes that need to be
tested is calculated by:
i = 1'
odds = 1;
M = Whatever;
while( odds > 0.5)
{
odds  = odds * (M-i)/M;
i++;
}

For a value of 1,000,000, i is 1178.

Fred Van Andel

Subject: Re: Birthday Attack calculations.
Date: Wed, 13 Jan 1999 07:52:27 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <369c50a1.3513345@news.io.com>
References: <369d3d9c.8943462@news.intergate.bc.ca>
Newsgroups: sci.crypt
Lines: 35

On Wed, 13 Jan 1999 06:50:51 GMT, in
<369d3d9c.8943462@news.intergate.bc.ca>, in sci.crypt
fred_vanandel@bigfoot.com (Fred Van Andel) wrote:

>[...]
>Using the equasion below the AVERAGE number of hashes that need to be
>tested is calculated by:
>   i = 1'
>   odds = 1;
>   M = Whatever;
>   while( odds > 0.5)
>      {
>         odds  = odds * (M-i)/M;
>         i++;
>      }
>
>For a value of 1,000,000, i is 1178.

Which seems to compare rather well to the value 1177.9 computed from
my formula in the earlier posting:

|   s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2
|
|where s is the expected number of samples needed, N the size of
|the population being sampled, and p the given probability.
|
|For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead
|of the usual handwave SQRT(N) = 1000 [...]

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM

Subject: Re: Birthday Attack calculations.
Date: Thu, 14 Jan 1999 04:37:24 GMT
From: fred_vanandel@bigfoot.com (Fred Van Andel)
Message-ID: <369d73bf.1141606@news.intergate.bc.ca>
References: <369c50a1.3513345@news.io.com>
Newsgroups: sci.crypt
Lines: 9

ritter@io.com (Terry Ritter) wrote:
>>For a value of 1,000,000, i is 1178.
>
>Which seems to compare rather well to the value 1177.9 computed from
>my formula in the earlier posting:

Why do you think I chose the number 1,000,000.  Its a good idea to
check your answers before making a fool of yourself in front of the
multitude.

```

Terry Ritter, his current address, and his top page.

Last updated: 1999-02-20