Birthday Attack Calculations


A Ciphers By Ritter Page


How can we relate the number of elements in a population and the number of random samples needed before we expect to find a duplicate or match?


Contents


Subject: Birthday Attack calculations. Date: Wed, 30 Dec 1998 23:16:02 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: <368ab32b.57279906@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 7 How does one calculate the exact number of documents that need to be hashed to ensure a collision? I know that it is approximately 1.2*2^(M/2), but what is the exact formula or procedure for calculating the number? Thanks Fred Van Andel
Subject: Re: Birthday Attack calculations. Date: 30 Dec 1998 23:59:16 GMT From: djohn37050@aol.com (DJohn37050) Message-ID: <19981230185916.28531.00004724@ng39.aol.com> References: <368ab32b.57279906@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 3 To ensure a collision, the exact number is n+1. THe birthday attack in probabilistic. See the HAC.
Subject: Re: Birthday Attack calculations. Date: Thu, 31 Dec 1998 03:46:01 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: <368af32e.73668929@news.intergate.bc.ca> References: <19981230185916.28531.00004724@ng39.aol.com> Newsgroups: sci.crypt Lines: 10 djohn37050@aol.com (DJohn37050) wrote: >To ensure a collision, the exact number is n+1. THe birthday attack in >probabilistic. See the HAC. I should have stated for 50% probibality, the formula given is for that value. My aploogies. Fred Van Andel
Subject: Re: Birthday Attack calculations. Date: 31 Dec 98 04:55:33 +0000 From: "Adam Atkinson" <ghira@mistral.co.uk> Message-ID: <563.669T1659T2953896@mistral.co.uk> References: <368af32e.73668929@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 28 On 31-Dec-98 03:46:01, Fred Van Andel said: >>To ensure a collision, the exact number is n+1. THe birthday attack in >>probabilistic. See the HAC. >I should have stated for 50% probibality, the formula given is for >that value. My aploogies. well, doing it "by hand" in perl would look something like: $n=43949268; #$n=365; #gives 23, which is right $p=1; for($i=1;$p>0.5;$i++) { $p*=($n-$i)/$n; } print $i."\n"; -- Adam Atkinson (ghira@mistral.co.uk) Verbing weirds language. (Calvin)
Subject: Re: Birthday Attack calculations. Date: Sat, 2 Jan 1999 14:41:56 +0000 From: David Broughton <David@ddina.demon.co.uk> Message-ID: <$7zENAA0Ajj2EwNz@ddina.demon.co.uk> References: <368ab32b.57279906@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 36 In article <368ab32b.57279906@news.intergate.bc.ca>, Fred Van Andel <fred_vanandel@bigfoot.com> writes > How does one calculate the exact number of documents that need to > be hashed to ensure a collision? I know that it is approximately > 1.2*2^(M/2), but what is the exact formula or procedure for > calculating the number? Try this formula: w = n^g + 0.29 - e where: w = the number of documents to get 50% probability of a collision n = number of different hash codes, all equiprobable g = 0.5 + 1/(6.13 * ln(n)) ln() is the natural logarithm function e is a small error that can be ignored in practice, usually < +- 1. This is an empirical formula that I worked out many years ago and filed away in a notebook. The exact formula is the value of w in this equation: ( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5 but this is not a practical calculation for large n. As you can see, w is a bit larger than the square root of n. For n = 10^6, for example, w = 1177.68 (e = -0.197). If your formula is meant to be 1.2 * n^0.5, then w = 1200 for n = 10^6 which is getting there. -- David Broughton
Subject: Re: Birthday Attack calculations. Date: Sat, 2 Jan 1999 19:11:25 GMT From: ppearson@netcom.com (Peter Pearson) Message-ID: <ppearsonF4y5B1.Jyv@netcom.com> References: <$7zENAA0Ajj2EwNz@ddina.demon.co.uk> Newsgroups: sci.crypt Lines: 38 In article <$7zENAA0Ajj2EwNz@ddina.demon.co.uk>, David Broughton <David@ddina.demon.co.uk> wrote: >In article <368ab32b.57279906@news.intergate.bc.ca>, Fred Van Andel ><fred_vanandel@bigfoot.com> writes > >> How does one calculate the exact number of documents that need to >> be hashed to ensure a collision? I know that it is approximately >> 1.2*2^(M/2), but what is the exact formula or procedure for >> calculating the number? > > [snip] > >The exact formula is the value of w in this equation: > > ( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5 > >but this is not a practical calculation for large n. To derive your own approximation formula from the above, exact formula, rewrite it as follows: n (n-1) (n-2) ... (n+1-w) n! product = -------------------------- = ---------- n^w (n-w)! n^w Then, use Stirling's approximation to make the factorials more manageable. Stirling's approximation (see, for example, Knuth, Art of Computer Programming, Volume 1) is: ln( n! ) = (n+1/2) ln(n) - n + ln( 2*pi )/2 + 1/(12*n) - ... You'll have to experiment with the number of terms required to get meaningful results. Overimplifying to n*ln(n)-n gives conspicuously nonsensical results. If memory serves, (n+1/2) ln(n) - n + ln( 2*pi )/2 is a good level of approximation, and one needs the approximation ln(1+x) = x (for small x) as well. - Peter
Subject: Re: Birthday Attack calculations. Date: Sat, 02 Jan 1999 22:46:05 GMT From: ritter@io.com (Terry Ritter) Message-ID: <368ea19e.8344667@news.io.com> References: <368ab32b.57279906@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 40 On Wed, 30 Dec 1998 23:16:02 GMT, in <368ab32b.57279906@news.intergate.bc.ca>, in sci.crypt fred_vanandel@bigfoot.com (Fred Van Andel) wrote: >How does one calculate the exact number of documents that need to be >hashed to ensure a collision? I know that it is approximately >1.2*2^(M/2), but what is the exact formula or procedure for >calculating the number? >[I should have stated for 50% probibality, the formula given is for >that value.] I like: s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2 where s is the expected number of samples needed, N the size of the population being sampled, and p the given probability. For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead of the usual handwave SQRT(N) = 1000, or the stated approximation, 1.2 * SQRT(N) = 1200. For N = 2**20 and p = 0.5, we get 1206.2 instead of 1.2 * 1024 = 1228.8 for the stated approximation. The formula basically comes out of my article on population estimation: http://www.io.com/~ritter/ARTS/BIRTHDAY.HTM It might be interesting to know of an application where this sort of precision is useful. --- Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM
Subject: Re: Birthday Attack calculations. Date: Wed, 06 Jan 1999 06:01:50 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: <3693fa55.17551686@news.intergate.bc.ca> References: <368ea19e.8344667@news.io.com> Newsgroups: sci.crypt Lines: 17 I would like to thank those who spent time in answering my question. It has proven to be very helpful. Terry Ritter wrote: >It might be interesting to know of an application where this sort of >precision is useful. The reason for the needed precision is that I am designing a variable length hash function and I want to test it for resistance to collision. Since I don't have the resources to do a full test for a 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes and search for trends. If the algorithm is resistant to collision in the smaller sizes and there is no trend away from the "proper" value then due to the nature of the algorithm I can be quite confidant that the larger hashes are also resistant to collisions. Fred Van Andel
Subject: Re: Birthday Attack calculations. Date: Wed, 06 Jan 1999 12:54:24 GMT From: dscott@networkusa.net Message-ID: <76vme0$avl$1@nnrp1.dejanews.com> References: <3693fa55.17551686@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 36 In article <3693fa55.17551686@news.intergate.bc.ca>, fred_vanandel@bigfoot.com wrote: > I would like to thank those who spent time in answering my question. > It has proven to be very helpful. > > Terry Ritter wrote: > >It might be interesting to know of an application where this sort of > >precision is useful. > > The reason for the needed precision is that I am designing a variable > length hash function and I want to test it for resistance to > collision. Since I don't have the resources to do a full test for a > 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes > and search for trends. If the algorithm is resistant to collision in > the smaller sizes and there is no trend away from the "proper" value > then due to the nature of the algorithm I can be quite confidant that > the larger hashes are also resistant to collisions. > > Fred Van Andel > This is a very good idea. Since only if hash is any good will it have the distribution predicticed by the bithday collision method. However funny things do happen and though it is a very god idea to check at the small lengths where more statisitics can be made. You should always run tests on the final lenght your going to use or you might get surprised. david scott -- http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip http://members.xoom.com/ecil/index.htm -----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Subject: Re: Birthday Attack calculations. Date: Thu, 07 Jan 1999 03:53:13 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: <36942dfe.1363360@news.intergate.bc.ca> References: <76vme0$avl$1@nnrp1.dejanews.com> Newsgroups: sci.crypt Lines: 32 > fred_vanandel@bigfoot.com wrote: > The reason for the needed precision is that I am designing a variable > length hash function and I want to test it for resistance to > collision. Since I don't have the resources to do a full test for a > 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes > and search for trends. If the algorithm is resistant to collision in > the smaller sizes and there is no trend away from the "proper" value > then due to the nature of the algorithm I can be quite confidant that > the larger hashes are also resistant to collisions. dscott@networkusa.net replied: > This is a very good idea. Since only if hash is any good will it >have the distribution predicticed by the bithday collision method. >However funny things do happen and though it is a very god idea >to check at the small lengths where more statisitics can be made. >You should always run tests on the final lenght your going to use >or you might get surprised. The whole point of testing on small hashes and extrapolating is that is computationally impossible to do a birthday attack on a large hash. For a 256 bit hash you will need to create more than 2^128 hashes before the odds of a collision reach 50%. Do You know how long that will take on my 486-66. Or even a planet full of computers for that matter. The indirect evidence is the ONLY indication of collision resistance. Fred Van Andel
Subject: Re: Birthday Attack calculations. Date: Thu, 07 Jan 1999 12:28:17 GMT From: dscott@networkusa.net Message-ID: <772990$kl5$1@nnrp1.dejanews.com> References: <36942dfe.1363360@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 59 In article <36942dfe.1363360@news.intergate.bc.ca>, fred_vanandel@bigfoot.com wrote: > > fred_vanandel@bigfoot.com wrote: > > > The reason for the needed precision is that I am designing a variable > > length hash function and I want to test it for resistance to > > collision. Since I don't have the resources to do a full test for a > > 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes > > and search for trends. If the algorithm is resistant to collision in > > the smaller sizes and there is no trend away from the "proper" value > > then due to the nature of the algorithm I can be quite confidant that > > the larger hashes are also resistant to collisions. > > dscott@networkusa.net replied: > > > This is a very good idea. Since only if hash is any good will it > >have the distribution predicticed by the bithday collision method. > >However funny things do happen and though it is a very god idea > >to check at the small lengths where more statisitics can be made. > >You should always run tests on the final lenght your going to use > >or you might get surprised. > > The whole point of testing on small hashes and extrapolating is that > is computationally impossible to do a birthday attack on a large hash. > For a 256 bit hash you will need to create more than 2^128 hashes > before the odds of a collision reach 50%. > > Do You know how long that will take on my 486-66. Or even a planet > full of computers for that matter. The indirect evidence is the ONLY > indication of collision resistance. > > Fred Van Andel > > Again you should run tests on the final length. I did not say to run the number necessiary to get a 50% collision. It is more than obvious you can't run that number of tests at your full length. However it would be stupid not to run some tests in hopes that no collision occur at long length. Since if some appear at all then something is wrong. So if it passes the short lengths test still are needed for finail length. The indeirect evidence is NOT the ONLY indication you should unless to irigant try the long case in hopes of no errors. David Scott P.S. Example I test scott16u very much. But I still had to do a few tests on scott19u. And the upscaling produced a few bugs that I would not have found and fixed if I smuggly ignored testing all together on the longer version. -- http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip http://members.xoom.com/ecil/index.htm -----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Subject: Re: Birthday Attack calculations. Date: Sat, 09 Jan 1999 00:14:38 GMT From: dscott@networkusa.net Message-ID: <77671c$3d2$1@nnrp1.dejanews.com> References: <775qop$cbb@qualcomm.com> <36942dfe.1363360@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 60 In article <775qop$cbb@qualcomm.com>, ggr@qualcomm.com (Gregory G Rose) wrote: > In article <36942dfe.1363360@news.intergate.bc.ca>, > Fred Van Andel <fred_vanandel@bigfoot.com> wrote: > <dscott@networkusa.net replied: > <> This is a very good idea. Since only if hash is any good will it > <>have the distribution predicticed by the bithday collision method. > <>However funny things do happen and though it is a very god idea > <>to check at the small lengths where more statisitics can be made. > <>You should always run tests on the final lenght your going to use > <>or you might get surprised. > < > <The whole point of testing on small hashes and extrapolating is that > <is computationally impossible to do a birthday attack on a large hash. > <For a 256 bit hash you will need to create more than 2^128 hashes > <before the odds of a collision reach 50%. > < > <Do You know how long that will take on my 486-66. Or even a planet > <full of computers for that matter. The indirect evidence is the ONLY > <indication of collision resistance. > > Now, let's not get testy. When David has a good > idea, we should encourage it. The way I > interpreted his comment was that, instead of > testing the cut-down algorithms, cut down the > output of the real algorithm. Generate your > 256-bit hashes, split them into 32- or 64- bit > chunks, and test *those* as if they had come from > a smaller generator. Any collision problems in the > larger hash, or any implementation glitch in > scaling it up, should show up. > > regards, > Greg. > -- NO it is not even close to my thoughts I thought it was a very good reply on my part sorry you where not capable of understanding it. So here it is in different words. It was previously stated that a cut down version of method tested. Apparently short enough to be able get meaning ful data based on probability of collisions. But the thing was worded that he should never test the FULL LENGTH case since the collision probabilty was virtually ZERO for any real number of tries one could make with the FULL LENGTH caae. But if the FULL LENGTH case is what one is programming the stupid hash function for. It would be FUCKING stupid not to run a few cases just to be DAM SURE non occurred. People can fuck up code you know. David Scott http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip http://members.xoom.com/ecil/index.htm -----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Subject: Re: Birthday Attack calculations. Date: Sat, 09 Jan 1999 05:44:15 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: <3698df07.97942674@news.intergate.bc.ca> References: <77671c$3d2$1@nnrp1.dejanews.com> Newsgroups: sci.crypt Lines: 61 dscott@networkusa.net wrote: > ggr@qualcomm.com (Gregory G Rose) wrote: >> Now, let's not get testy. When David has a good >> idea, we should encourage it. The way I >> interpreted his comment was that, instead of >> testing the cut-down algorithms, cut down the >> output of the real algorithm. Generate your >> 256-bit hashes, split them into 32- or 64- bit >> chunks, and test *those* as if they had come from >> a smaller generator. Any collision problems in the >> larger hash, or any implementation glitch in >> scaling it up, should show up. >> >> regards, >> Greg. >> -- > >NO it is not even close to my thoughts > > I thought it was a very good reply on my part sorry >you where not capable of understanding it. So here >it is in different words. It was previously stated >that a cut down version of method tested. Apparently >short enough to be able get meaning ful data based >on probability of collisions. But the thing was >worded that he should never test the FULL LENGTH >case since the collision probabilty was virtually >ZERO for any real number of tries one could make >with the FULL LENGTH caae. But if the FULL LENGTH >case is what one is programming the stupid hash >function for. It would be FUCKING stupid not to run >a few cases just to be DAM SURE non occurred. People >can fuck up code you know. > >David Scott > Unfortunately running one test of a 256 bit hash for even a tiny part of its range will require far more disk space than I have access to and will have virtually zero chance of finding a collision. Even testing a mere 2^32 hashes compared with the > 2^64 required for a 50% collision of a 256 bit hash would require 128 GigaBytes of disk space to store all of the calculated hashes. I only wish that I had that much. I will carry the testing into as far as I can go and still generate statistically significant numbers. There is no point expending all my computing power trying to force a collision of one relatively large hash because the result would be a matter of dumb luck and completely meaningless. But by testing millions of tiny hashes and tens of thousands of 40 or 48 bit hashes I can generate some meaningful numbers. If the hash is so fatally flawed that I could routinely find a collision at much lower than the normal values then I would have found this out at the smaller hash sizes. The algorithm is scalable, there is no real difference between a small hash and a large hash except for the obvious. Therefore I expect that the properties of the smaller sizes to carry into the larger sizes. Fred van Andel
Subject: Re: Birthday Attack calculations. Date: Sat, 09 Jan 1999 13:40:58 GMT From: dscott@networkusa.net Message-ID: <777m9a$985$1@nnrp1.dejanews.com> References: <3698df07.97942674@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 81 In article <3698df07.97942674@news.intergate.bc.ca>, fred_vanandel@bigfoot.com wrote: > dscott@networkusa.net wrote: snip... > >NO it is not even close to my thoughts > > > > I thought it was a very good reply on my part sorry > >you where not capable of understanding it. So here > >it is in different words. It was previously stated > >that a cut down version of method tested. Apparently > >short enough to be able get meaning ful data based > >on probability of collisions. But the thing was > >worded that he should never test the FULL LENGTH > >case since the collision probabilty was virtually > >ZERO for any real number of tries one could make > >with the FULL LENGTH caae. But if the FULL LENGTH > >case is what one is programming the stupid hash > >function for. It would be FUCKING stupid not to run > >a few cases just to be DAM SURE non occurred. People > >can fuck up code you know. > > > >David Scott > > > Unfortunately running one test of a 256 bit hash for even a tiny part > of its range will require far more disk space than I have access to > and will have virtually zero chance of finding a collision. Even > testing a mere 2^32 hashes compared with the > 2^64 required for a 50% > collision of a 256 bit hash would require 128 GigaBytes of disk space > to store all of the calculated hashes. I only wish that I had that > much. > Then run 100 cases after your done with your low level testing. I you can't run 100 cases your method sucks. if you get any collisions at all in the 100 cases then you FUCKED up the CODE. But you MUST check the code for some cases that you actually suspect the code to run on. It reminds me of work when we bought code that was to run on a DEC machine using VT100 terminals. THE CRAP didn't work we talked to the company on phone several times. They claimed it was developed on DEC machines using VT100 terminals after months when one of there representatives came out to our site after several visits they brougth one of there VT100 termainals the CRAP ran. But they had used a VT100 clone there code would not work on a real VT100 DEC terminal. I think the company died I sure hope so. But you have to run REAL TESTS on the real one you expect the code to be used. Even if your a pompous asshole and irrigantly expect no problem. > I will carry the testing into as far as I can go and still generate > statistically significant numbers. There is no point expending all my > computing power trying to force a collision of one relatively large > hash because the result would be a matter of dumb luck and completely > meaningless. But by testing millions of tiny hashes and tens of > thousands of 40 or 48 bit hashes I can generate some meaningful > numbers. > > If the hash is so fatally flawed that I could routinely find a > collision at much lower than the normal values then I would have found > this out at the smaller hash sizes. The algorithm is scalable, there > is no real difference between a small hash and a large hash except for > the obvious. Therefore I expect that the properties of the smaller > sizes to carry into the larger sizes. > You can expect all you want. Yor should more like a manager than a real programmer with much common sense. > Fred van Andel > David Scott http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip http://members.xoom.com/ecil/index.htm -----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Subject: Re: Birthday Attack calculations. Date: Sun, 10 Jan 1999 11:49:10 GMT From: dscott@networkusa.net Message-ID: <77a43m$7vu$1@nnrp1.dejanews.com> References: <369913dc.21716751@news.intergate.bc.ca> <777m9a$985$1@nnrp1.dejanews.com> Newsgroups: sci.crypt Lines: 59 In article <369913dc.21716751@news.intergate.bc.ca>, fred_vanandel@bigfoot.com wrote: > dscott@networkusa.net wrote: > > >In article <3698df07.97942674@news.intergate.bc.ca>, > > fred_vanandel@bigfoot.com wrote: > >> dscott@networkusa.net wrote: > > > > snip... > > > > > > Then run 100 cases after your done with your low level > >testing. I you can't run 100 cases your method sucks. > >if you get any collisions at all in the 100 cases then you > >FUCKED up the CODE. But you MUST check the code for some > >cases that you actually suspect the code to run on. > > It reminds me of work when we bought code that was to run > >on a DEC machine using VT100 terminals. THE CRAP didn't work > >we talked to the company on phone several times. They claimed > >it was developed on DEC machines using VT100 terminals after > >months when one of there representatives came out to our site > >after several visits they brougth one of there VT100 termainals > >the CRAP ran. But they had used a VT100 clone there code > >would not work on a real VT100 DEC terminal. I think the > >company died I sure hope so. But you have to run REAL TESTS > >on the real one you expect the code to be used. Even if your > >a pompous asshole and irrigantly expect no problem. > > > There is not enough computing power on the planet to do a full > collision test on a 256 bit hash. Do the math yourself. > > > > You can expect all you want. Yor should more like a manager than > >a real programmer with much common sense. > > I think there is an insult in there somewhere, I guess that means I > have arrived. > Actual I am disapointed that some one else caught your mathematical erros while I was just wasting my time telling you over and over that a FULL collision test on your 256 bit hash was not needed. But you do need to try a few cases just to verify that ZERO occur but your obviously to pigheaded to understand this point. If this makes you think you have arrived then go ahead your arrived just like mr H. > Fred Van Andel > David Scott P.S. I never did trust the stability of a Van compared to a Car. http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip http://members.xoom.com/ecil/index.htm -----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Subject: Re: Birthday Attack calculations. Date: Sat, 09 Jan 1999 04:46:16 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: <3696dcdf.97390149@news.intergate.bc.ca> References: <775qop$cbb@qualcomm.com> <36942dfe.1363360@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 27 ggr@qualcomm.com (Gregory G Rose) wrote: >Now, let's not get testy. When David has a good >idea, we should encourage it. The way I >interpreted his comment was that, instead of >testing the cut-down algorithms, cut down the >output of the real algorithm. Generate your >256-bit hashes, split them into 32- or 64- bit >chunks, and test *those* as if they had come from >a smaller generator. Any collision problems in the >larger hash, or any implementation glitch in >scaling it up, should show up. The smaller hash sizes are not from a 'cut-down' algorithm. The nature of this algorithm is that is can create hashes of any desired size (actually in 8 bit multiples). Any given size is no more 'natural' than any other size. If a smaller hash is collision resistant then I would expect that a larger hash is also collision resistant. The only real difference is that the small hashes are very easy to test, the larger one are more difficult. I hope to have access to about 12 Pentium machines during evenings and weekends for testing purposes, this will allow me to crunch quite a few samples in search of collisions. Fred Van Andel
Subject: Re: Birthday Attack calculations. Date: Sat, 09 Jan 1999 15:20:47 -0500 From: "Trevor Jackson, III" <fullmoon@aspi.net> Message-ID: <3697BA1F.9FE69B16@aspi.net> References: <36942dfe.1363360@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 31 > The whole point of testing on small hashes and extrapolating is that > is computationally impossible to do a birthday attack on a large hash. > For a 256 bit hash you will need to create more than 2^128 hashes > before the odds of a collision reach 50%. > > Fred Van Andel Something is wrong with this paragraph. Either the attack is not a birthday attack, or the odds of a collision are much higher. An exhaustive search of a 2^256 element state space has a 50% chance of a match agains the target of the search. This is not a birthday attack. A birthday attack searches for any matching pair rather than a match against a target. Thus a birthday attack searching N states implies generating O(N) states, but comparisons among N states means O(N^2) comparisons (assuming the states are not generated in an order that allows comparison against neighbors only). Thus for N small the generation cost may dominate the overall cost of the search but for even a 40 bit key the comparison cost will dominate due to the 2^79 comparisons required. The odds of finding a matching pair among N of M states is the product over 0 to N-1 of the expression (M-i)/M. odds = 1; for i=0 to N-1 odds = odds * (M-i)/M Thus a birthday attack can find a matching pair of elements much faster than an exhaustive attack can find a match for a chosen target.
Subject: Re: Birthday Attack calculations. Date: Sun, 10 Jan 1999 08:31:51 -0500 From: "Trevor Jackson, III" <fullmoon@aspi.net> Message-ID: <3698ABC7.B01FDDD9@aspi.net> References: <3697c1c9.703028@news.intergate.bc.ca> <3697BA1F.9FE69B16@aspi.net> Newsgroups: sci.crypt Lines: 80 Fred Van Andel wrote: > "Trevor Jackson, III" <fullmoon@aspi.net> wrote: > > >> The whole point of testing on small hashes and extrapolating is that > >> is computationally impossible to do a birthday attack on a large hash. > >> For a 256 bit hash you will need to create more than 2^128 hashes > >> before the odds of a collision reach 50%. > >> > >> Fred Van Andel > > > >Something is wrong with this paragraph. Either the attack is not a birthday > >attack, or the odds of a collision are much higher. An exhaustive search of > >a 2^256 element state space has a 50% chance of a match agains the target of > >the search. This is not a birthday attack. > > > >A birthday attack searches for any matching pair rather than a match against > >a target. Thus a birthday attack searching N states implies generating O(N) > >states, but comparisons among N states means O(N^2) comparisons (assuming > >the states are not generated in an order that allows comparison against > >neighbors only). Thus for N small the generation cost may dominate the > >overall cost of the search but for even a 40 bit key the comparison cost > >will dominate due to the 2^79 comparisons required. > > > >The odds of finding a matching pair among N of M states is the product over > >0 to N-1 of the expression (M-i)/M. > > > > odds = 1; > > for i=0 to N-1 > > odds = odds * (M-i)/M > > > >Thus a birthday attack can find a matching pair of elements much faster than > >an exhaustive attack can find a match for a chosen target. > > Lets see if I understand you correctly. First you generate a pair of > hashes and compare them to each other. If they don't match then you > throw them away and generate a new pair. Repeat until you find a > match. > Hardly. Throwing away samples wastes the effort invested in creating them. For the purpose of the reply I assumed that each sample would be stored after comparison with the previously stored samples. Thus we need to store N samples, and compare (N^2)/2 times. > But by this method each pair only has one chance in N of being a > collision and therefore would require N/2 total comparisons to reach > the 50% level. When we are talking large hashes this becomes a hugh > number. > > Another way is to generate all the hashes up front, sort them and then > compare. For example with a 64 bit hash you could generate your 1.2 * > 2^32 hashes and save them to a file. Sorting the file will take > roughly 32 * 2^32 compares for a total of approx 2^37 operations. > This is much more feasible that the 2^63 operations required when > testing pairs of data. You could run approximately 2^26 of the > sorting tests in the same time as it takes to run one of the pairs > test. > > Bear in mind that this method does not give you the average but rather > how often you are successful at the 50% level. This is a slightly > different number. If you want the true average you must generate more > numbers and then determine afterwards where the match occurred. > > You refer to 2^79 compares required for a 40 bit hash. This the > absolute worst case in which no collisions are generated even after > all possible hashes have been created. The odds of reaching the end > of the hash without a collision are extremely remote. Even for a hash > as small as 8 bits, the odds of generating all possible hashes without > a collision is about 1 in 10^110. For our 40 bit example most of the > cases will cluster around 20 bits that corresponds to about 2^39 > compares. That is much more manageable. You are still using the numbers for exhaustive search. The birthday attack numbers are much smaller. As I don't have an arbitrary precision calculator at hand I'll have to do some work to generate them accurately. I'll try to generate them and post here
Subject: Re: Birthday Attack calculations. Date: Mon, 11 Jan 1999 03:43:01 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: <36996c54.7217058@news.intergate.bc.ca> References: <3698ABC7.B01FDDD9@aspi.net> Newsgroups: sci.crypt Lines: 26 "Trevor Jackson, III" <fullmoon@aspi.net> wrote: >Hardly. Throwing away samples wastes the effort invested in creating them. For >the purpose of the reply I assumed that each sample would be stored after >comparison with the previously stored samples. I obviously misunderstood what you were trying to say and I tailored my comments around that misunderstanding. My apologies for attempting to put words in your mouth. However I still stand by my original statement. A birthday attack on a 256 bit hash would require in excess of 2^128 hashes to be calculated and stored before the odds of a collision reach 50%. >You are still using the numbers for exhaustive search. The birthday attack >numbers are much smaller. As I don't have an arbitrary precision calculator at >hand I'll have to do some work to generate them accurately. I'll try to generate >them and post here. A birthday attack would require > 2^128 calculation while an exhaustive search would need 2^255. There is a big difference. When you are dealing with large hashes even a birthday attack becomes impossible to do. Its the birthday attach that dictates the size of hash required, not the exhaustive search. Fred Van Andel
Subject: Re: Birthday Attack calculations. Date: Mon, 11 Jan 1999 22:45:19 -0500 From: "Trevor Jackson, III" <fullmoon@aspi.net> Message-ID: <369AC54F.CE51AD60@aspi.net> References: <36996c54.7217058@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 12 Fred Van Andel wrote: > A birthday attack would require > 2^128 calculation while an > exhaustive search would need 2^255. There is a big difference. When > you are dealing with large hashes even a birthday attack becomes > impossible to do. Its the birthday attach that dictates the size of > hash required, not the exhaustive search. I agree with your last statement above. But I an confused by your first statement above. Is there a closed-form equation for the figure 2^128 you quoted? I am only familiar with the probability series summation.
Subject: Re: Birthday Attack calculations. Date: Tue, 12 Jan 1999 06:34:34 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: <369ae5c4.103855861@news.intergate.bc.ca> References: <369AC54F.CE51AD60@aspi.net> Newsgroups: sci.crypt Lines: 97 "Trevor Jackson, III" <fullmoon@aspi.net> wrote: >Fred Van Andel wrote: > >> A birthday attack would require > 2^128 calculation while an >> exhaustive search would need 2^255. There is a big difference. When >> you are dealing with large hashes even a birthday attack becomes >> impossible to do. Its the birthday attach that dictates the size of >> hash required, not the exhaustive search. > >I agree with your last statement above. But I an confused by your first statement >above. Is there a closed-form equation for the figure 2^128 you quoted? I am only >familiar with the probability series summation. You quoted the formula yourself in a earlier message > odds = 1; > for i=0 to N-1 > odds = odds * (M-i)/M For any gives value of M the location of the 50% mark will be roughly the square root of M. The square root of 2^256 is 2^128. Unless I am misunderstanding you again. Some closed forms of the equations were given in the earlier posts of this thread. This method was posted by David Broughton > w = n^g + 0.29 - e > >where: >w = the number of documents to get 50% probability of a collision >n = number of different hash codes, all equiprobable >g = 0.5 + 1/(6.13 * ln(n)) >ln() is the natural logarithm function >e is a small error that can be ignored in practice, usually < +- 1. > >This is an empirical formula that I worked out many years ago and >filed away in a notebook. > >The exact formula is the value of w in this equation: > > ( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5 > >but this is not a practical calculation for large n. > >As you can see, w is a bit larger than the square root of n. For >n = 10^6, for example, w = 1177.68 (e = -0.197). > >If your formula is meant to be 1.2 * n^0.5, then w = 1200 for n = >10^6 which is getting there. Or this one by Peter Pearson >To derive your own approximation formula from the above, exact >formula, rewrite it as follows: > > n (n-1) (n-2) ... (n+1-w) n! >product = -------------------------- = ---------- > n^w (n-w)! n^w > >Then, use Stirling's approximation to make the factorials more >manageable. Stirling's approximation (see, for example, Knuth, >Art of Computer Programming, Volume 1) is: > > ln( n! ) = (n+1/2) ln(n) - n + ln( 2*pi )/2 + 1/(12*n) - ... > >You'll have to experiment with the number of terms required to >get meaningful results. Overimplifying to n*ln(n)-n gives conspicuously >nonsensical results. If memory serves, (n+1/2) ln(n) - n + ln( 2*pi )/2 >is a good level of approximation, and one needs the approximation >ln(1+x) = x (for small x) as well. Or this one by Terry Ritter > s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2 > >where s is the expected number of samples needed, N the size of the >population being sampled, and p the given probability. > >For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead of >the usual handwave SQRT(N) = 1000, or the stated approximation, 1.2 * >SQRT(N) = 1200. > >For N = 2**20 and p = 0.5, we get 1206.2 instead of 1.2 * 1024 = >1228.8 for the stated approximation. > >The formula basically comes out of my article on population >estimation: > > http://www.io.com/~ritter/ARTS/BIRTHDAY.HTM > Fred Van Andel
Subject: Re: Birthday Attack calculations. Date: Tue, 12 Jan 1999 08:34:48 -0500 From: "Trevor Jackson, III" <fullmoon@aspi.net> Message-ID: <369B4F77.E076C774@aspi.net> References: <369ae5c4.103855861@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 112 Thanks for the background exposition. I think I am slightly confused over the odds of finding the first collision and the odds of finding no or any collisions. For the purposes of the original issue we want the odds of finding the first collision. I.e., the expected wait. The series formula I described indicates the probability of finding no collisions, which also gives (by 1-p) the probability of finding any number of collisions. Presumably even odds of finding a single collision should take less work than even odds of finding all collisions. Is this correct? Fred Van Andel wrote: > "Trevor Jackson, III" <fullmoon@aspi.net> wrote: > > >Fred Van Andel wrote: > > > >> A birthday attack would require > 2^128 calculation while an > >> exhaustive search would need 2^255. There is a big difference. When > >> you are dealing with large hashes even a birthday attack becomes > >> impossible to do. Its the birthday attach that dictates the size of > >> hash required, not the exhaustive search. > > > >I agree with your last statement above. But I an confused by your first statement > >above. Is there a closed-form equation for the figure 2^128 you quoted? I am only > >familiar with the probability series summation. > > You quoted the formula yourself in a earlier message > > > odds = 1; > > for i=0 to N-1 > > odds = odds * (M-i)/M > > For any gives value of M the location of the 50% mark will be roughly > the square root of M. The square root of 2^256 is 2^128. > > Unless I am misunderstanding you again. > > Some closed forms of the equations were given in the earlier posts of > this thread. > > This method was posted by David Broughton > > > w = n^g + 0.29 - e > > > >where: > >w = the number of documents to get 50% probability of a collision > >n = number of different hash codes, all equiprobable > >g = 0.5 + 1/(6.13 * ln(n)) > >ln() is the natural logarithm function > >e is a small error that can be ignored in practice, usually < +- 1. > > > >This is an empirical formula that I worked out many years ago and > >filed away in a notebook. > > > >The exact formula is the value of w in this equation: > > > > ( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5 > > > >but this is not a practical calculation for large n. > > > >As you can see, w is a bit larger than the square root of n. For > >n = 10^6, for example, w = 1177.68 (e = -0.197). > > > >If your formula is meant to be 1.2 * n^0.5, then w = 1200 for n = > >10^6 which is getting there. > > Or this one by Peter Pearson > > >To derive your own approximation formula from the above, exact > >formula, rewrite it as follows: > > > > n (n-1) (n-2) ... (n+1-w) n! > >product = -------------------------- = ---------- > > n^w (n-w)! n^w > > > >Then, use Stirling's approximation to make the factorials more > >manageable. Stirling's approximation (see, for example, Knuth, > >Art of Computer Programming, Volume 1) is: > > > > ln( n! ) = (n+1/2) ln(n) - n + ln( 2*pi )/2 + 1/(12*n) - ... > > > >You'll have to experiment with the number of terms required to > >get meaningful results. Overimplifying to n*ln(n)-n gives conspicuously > >nonsensical results. If memory serves, (n+1/2) ln(n) - n + ln( 2*pi )/2 > >is a good level of approximation, and one needs the approximation > >ln(1+x) = x (for small x) as well. > > Or this one by Terry Ritter > > > s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2 > > > >where s is the expected number of samples needed, N the size of the > >population being sampled, and p the given probability. > > > >For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead of > >the usual handwave SQRT(N) = 1000, or the stated approximation, 1.2 * > >SQRT(N) = 1200. > > > >For N = 2**20 and p = 0.5, we get 1206.2 instead of 1.2 * 1024 = > >1228.8 for the stated approximation. > > > >The formula basically comes out of my article on population > >estimation: > > > > http://www.io.com/~ritter/ARTS/BIRTHDAY.HTM > > > > Fred Van Andel
Subject: Re: Birthday Attack calculations. Date: Wed, 13 Jan 1999 06:50:51 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: <369d3d9c.8943462@news.intergate.bc.ca> References: <369B4F77.E076C774@aspi.net> Newsgroups: sci.crypt Lines: 27 "Trevor Jackson, III" <fullmoon@aspi.net> wrote: >Thanks for the background exposition. I think I am slightly confused over the odds of >finding the first collision and the odds of finding no or any collisions. > >For the purposes of the original issue we want the odds of finding the first collision. >I.e., the expected wait. The series formula I described indicates the probability of >finding no collisions, which also gives (by 1-p) the probability of finding any number >of collisions. > >Presumably even odds of finding a single collision should take less work than even odds >of finding all collisions. Is this correct? Using the equasion below the AVERAGE number of hashes that need to be tested is calculated by: i = 1' odds = 1; M = Whatever; while( odds > 0.5) { odds = odds * (M-i)/M; i++; } For a value of 1,000,000, i is 1178. Fred Van Andel
Subject: Re: Birthday Attack calculations. Date: Wed, 13 Jan 1999 07:52:27 GMT From: ritter@io.com (Terry Ritter) Message-ID: <369c50a1.3513345@news.io.com> References: <369d3d9c.8943462@news.intergate.bc.ca> Newsgroups: sci.crypt Lines: 35 On Wed, 13 Jan 1999 06:50:51 GMT, in <369d3d9c.8943462@news.intergate.bc.ca>, in sci.crypt fred_vanandel@bigfoot.com (Fred Van Andel) wrote: >[...] >Using the equasion below the AVERAGE number of hashes that need to be >tested is calculated by: > i = 1' > odds = 1; > M = Whatever; > while( odds > 0.5) > { > odds = odds * (M-i)/M; > i++; > } > >For a value of 1,000,000, i is 1178. Which seems to compare rather well to the value 1177.9 computed from my formula in the earlier posting: | s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2 | |where s is the expected number of samples needed, N the size of |the population being sampled, and p the given probability. | |For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead |of the usual handwave SQRT(N) = 1000 [...] --- Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM
Subject: Re: Birthday Attack calculations. Date: Thu, 14 Jan 1999 04:37:24 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: <369d73bf.1141606@news.intergate.bc.ca> References: <369c50a1.3513345@news.io.com> Newsgroups: sci.crypt Lines: 9 ritter@io.com (Terry Ritter) wrote: >>For a value of 1,000,000, i is 1178. > >Which seems to compare rather well to the value 1177.9 computed from >my formula in the earlier posting: Why do you think I chose the number 1,000,000. Its a good idea to check your answers before making a fool of yourself in front of the multitude.

Terry Ritter, his current address, and his top page.

Last updated: 1999-02-20