Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function, and thus is a measure of one kind of cipher strength. By measuring the nonlinearity of Feistel constructions with various numbers of rounds and tables, we hope to gain insight into how Feistel mixing compares to other alternatives.
From: ritter@io.com (Terry Ritter) Newsgroups: sci.crypt Subject: Measured Nonlinearity in Feistel Constructions (LONG!) Date: Thu, 26 Feb 1998 06:39:28 GMT Lines: 846 Message-ID: <34f50e19.599891@news.io.com> MEASURED NONLINEARITY IN FEISTEL CONSTRUCTIONS Terry Ritter Ritter Software Engineering http://www.io.com/~ritter/ 1998-02-25 Abstract Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function, and thus is a measure of one kind of cipher strength. By measuring the nonlinearity of Feistel constructions with various numbers of rounds and tables, we hope to gain insight into how Feistel mixing compares to other alternatives. Introduction The ideal block cipher is a keyed simple substitution table of sufficient size. Unfortunately, with 128-bit blocks, there would be 2**128 entries in that table, which is completely out of the question. So the modern block cipher is a *construction* intended to *simulate* a keyed substitution of the desired size. At issue is the effectiveness of the construction technology. One way to investigate this is by using Boolean function theory, since a substitution table -- or cipher -- can be considered a set of independent Boolean functions, one for each output bit. A Boolean function produces a single-bit result for each possible combination of values from perhaps many Boolean variables. The *nonlinearity* of a Boolean function is the Hamming distance to the closest affine function [e.g., PIE88, PIE89, PIE89B]. (Also see: http://www.io.com/~ritter/ARTS/MEASNONL.HTM ). That is, nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. If "linearity" is considered a significant cryptographic weakness, nonlinearity is an explicit measure of the *lack* of that weakness. So nonlinearity measures one form of cipher "strength." For cryptographic purposes, it is desired to take the nonlinearity of a substitution table to be the *minimum* of the nonlinearity values for each output bit in that table. Nonlinearity is measured by forming the (one-bit-wide) truth table for a particular output bit, then performing a Fast Walsh-Hadamard Transform (FWT) on that array. Each result value is essentially a correlation count to a particular affine function, and the minimum distance (the maximum correlation) is found by scanning the transform results. In measuring nonlinearity, it is generally necessary to record the function result for each possible combination of input variables. If we have an "8-bit" table, we must record and then transform 256 elements, and if we have a "16-bit" table, we must record and transform 64K elements. Thus, measuring large functions rapidly becomes impossible. So, although we cannot hope to measure the nonlinearity of a real 64-bit or 128-bit block cipher, we *can* measure nonlinearity in substitution tables and small block constructions. Here we deal with 5-bit tables and the resulting 10-bit blocks. Ideal Nonlinearity Distributions By properly shuffling tables using a large-state cryptographic RNG (in particular: http://www.io.com/~ritter/KEYSHUF.HTM ) we can sample or construct a presumably uniform distribution of different table arrangements. We can measure the nonlinearity of each table, and accumulate the results. For 5-bit tables, we get a nonlinearity distribution something like that shown in Figure 1. Nonlinearity Distribution in 5-Bit Tables Fig. 1 0.7 | * 0.6 | * 0.5 | * 0.4 | * 0.3 | * 0.2 | * * 0.1 | * * * 0.0 | * * * Prob +--+--+--+--+--+--+-- 0 2 4 6 8 10 Nonlinearity Our block cipher construction problem essentially involves using small random tables (here 5-bit tables with 32 entries and a mean nonlinearity of 7.8) to somehow emulate a table which is twice as wide (here a 10-bit table with 1024 entries and a mean nonlinearity of 447.7). For 10-bit tables, we get a nonlinearity distribution something like that in Figure 2. Nonlinearity Distribution in 10-Bit Tables Fig. 2 0.2 | 0.175 | * * 0.15 | * * * 0.125 | * * * * 0.1 | * * * * * 0.075 | * * * * * * 0.05 | * * * * * * * * 0.025 | * * * * * * * * * * 0.00 | * * * * * * * * * * * Prob +--+--+--+--+--+--+--+--+--+--+--+--+-- 436 440 444 448 452 456 Nonlinearity Feistel Mixing The classic Feistel construction of Figure 3 is a way of mixing two small blocks into a large block. Here, + is the exclusive-OR function. As used in the U.S. Data Encryption Standard (DES), L and R are each 32 bits; here we have only 5 bits in each. Feistel Construction Fig. 3 L R | | |--> F --> + round 1 | | + <-- F <--| round 2 | | v v L' R' In DES, ideal properties for F are much-discussed topics. The Feistel construction itself supports the ability to encipher and decipher using a random function F, not necessarily a permutation. But the 8 different "S-boxes" used in DES *are* permutations, if we see each as a set of 4 permutations per box, so using random permutations in Feistel mixing is not at all unreasonable. And by using permutation tables as components we have a single compatible measure both for the component tables and the simulated larger table, and thus can draw conclusions about the construction. Invertible tables also support a direct comparison to earlier work on nonlinearity with respect to Mixing and VSBC constructions. Unlike the earlier work, the decision to use random invertible substitutions for function F still leaves us with a couple of remaining parameters: The number of rounds, and the number of tables. Here we investigate the mixing obtained from the Feistel construction with two variations: The use of a single table for various numbers of rounds, and the use of a new random table on every other round, for various numbers of rounds. Feistel Mixing and DES Unfortunately, the Feistel construction is *not* the only form of mixing found in DES. Indeed, the practicality of the Feistel construction depends upon having (that is, *constructing*) a wide function F. And while F *might* be realized as a recursive sequence of smaller Feistel constructions, any such structure would be impractically slow. So the very existence of practical Feistel ciphers depends upon the construction of a random function F which itself mixes and diffuses information across its width. In each DES "S-box," the outside input bits (b0 and b5) -- which are set by expansion "E" -- just select which of the 4 boxes to use. This is a data-dependent dynamic table selection which is, in itself, a form of mixing. Of course, the simple combination of expansion "E" and the array of "S-boxes" by themselves probably would not produce good mixing: A single bit-change at one end of the input block could propagate no faster than one table per round, *if* all data were kept aligned. But this is not the case, since "Permutation P" transposes the data bits. So, other than the dynamic table selection (the two extra inputs for each S-box), function F is itself a classic substitution-permutation design. This means that the overall mixing in DES is a complex combination of the Feistel construction *plus* the expansion, table-selection, table lookup and permutation in each round. This complexity makes it difficult to independently extract the importance and consequences of each feature, as we surely must if we are to deeply understand the design. This confluence of multiple features also makes it impossible to scale DES to a tiny testable size. Thus, we seem to be limited to testing a tiny Feistel construction -- which *is* scalable -- instead of a tiny DES. Feistel Mixing with Random Tables A common assumption about Feistel mixing is that some fixed small set of optimized tables will be used repeatedly. This concept presumably comes from DES. But in the special context of DES: 1. DES S-boxes each contain 4 separate *permutations*, and are *not* random constructions; and 2. The DES permutations are only 4 bits wide, and 4-bit-wide permutations are just about the weakest possible tables that have any nonlinear strength at all. Accordingly, it should come as no surprise that replacing DES S-boxes with random tables is not a good idea. And random 4-bit permutations are often weak, so this is *also* not a good idea. In marked contrast, random 8-permutations are almost *never* that weak (the chance of finding even one linear output is 1 in 2**239). It thus appears that the assumption that random tables are necessarily bad is based on the special context of DES, and is not valid in general. Comparisons to Mixing and VSBC Constructions Earlier nonlinearity measurement experiments on tiny Mixing ( http://www.io.com/~ritter/ARTS/MIXNONLI.HTM ) and VSBC ( http://www.io.com/~ritter/ARTS/VSBCNONL.HTM ) constructions also used random tables, thus potentially making it possible to see how Feistel mixing compares. Unfortunately, the comparison is not exact, because both the Mixing and VSBC constructions do in some sense mix multiple tables in a single mixing level. With Feistel mixing, we would normally think of multiple "rounds" of mixing, rather than having several tables in each round. DES does of course manage this, but does not give us a scalable rule which we could use to build a tiny version with multiple tables. And even in the best possible situation we would be looking at the experimental measurement of a 16-bit simulated table, for which it would be very difficult to collect an accurate ideal distribution. So, given the limitations of reality, the first thing we might want to look at is the nonlinearity measure from Feistel mixing with a single table, under a varying the number of rounds. Then we might look at using multiple tables, again for a varying number of rounds. The Experiments We construct 10,000 random 5-bit tables, which are then used in 10-bit Feistel constructions. Each of these constructions is measured for nonlinearity, and the result accumulated. While Figure 2 gives a general idea of the desired distribution, we eventually want to use a chi-square comparison, for which we need an ideal reference distribution. The ideal model in Figure 4 is thus constructed from three different 15-hour trials of 1,000,000 random 10-bit tables each, with the counts reduced by a factor of 100 and rounded to an integer. The Ideal 10-Bit Nonlinearity Distribution Fig. 4 NL Trial 1 Trial 2 Trial 3 Ideal 462 0 1 0 0 460 55 75 55 1 458 2145 2269 2234 22 456 20537 20938 20882 208 454 76259 76344 75980 763 452 148847 148135 148510 1485 450 188329 187802 187585 1878 448 177775 178208 178616 1782 446 140283 140219 140407 1403 444 97415 97317 97465 974 442 61999 62282 62213 622 440 37829 37685 37382 376 438 21675 21789 21753 218 436 12417 12320 12408 124 434 6826 6818 6912 68 432 3740 3693 3584 37 430 1912 2050 1884 19 428 979 1043 1073 10 426 490 537 526 5 424 249 241 265 2 422 119 133 143 1 420 64 55 69 0 418 26 25 30 0 416 12 14 12 0 414 9 5 6 0 412 3 2 4 0 410 0 0 1 0 In each experimental trial, we first initialize the substitutions from a random key. This produces a particular 10-bit cipher -- a simulated 10-bit simple substitution. We then traverse the complete transformation -- all 1024 possible data values -- and collect all 1024 10-bit ciphertext results. We then traverse all 10 ciphertext bit-columns, one-by-one, and perform a nonlinearity computation on each. The minimum nonlinearity value found across all columns is accumulated into the growing distribution. We see the results for 2 rounds of Feistel mixing in Figure 5. Feistel Nonlinearity Distribution One Table, Two Rounds Fig. 5 NL Expect Found 64 0 2 128 0 78 192 0 1494 256 0 7387 320 0 1039 Clearly, 2 Feistel mixing rounds do not approximate the nonlinearity distribution from random substitutions, so we move on to 4 rounds as in Figure 6. Feistel Nonlinearity Distribution One Table, Four Rounds Fig. 6 NL Expect Found 224 0 1 312 0 56 320 0 1 344 0 1 352 0 35 364 0 1 368 0 8 384 0 696 388 0 3 392 0 187 396 0 2 400 0 11 404 0 15 406 0 3 408 0 13 410 0 2 412 0 38 414 0 2 416 0 1709 418 0 4 420 0 66 422 1 11 424 2 126 426 5 13 428 10 256 430 19 20 432 37 780 434 68 51 436 124 660 438 218 109 440 376 3566 442 622 70 444 974 589 446 1403 91 448 1782 639 450 1878 43 452 1485 107 454 763 12 456 208 2 458 22 1 460 1 0 462 0 0 Again, the is pretty brutal: So even four Feistel mixing rounds do not approximate random substitutions. And neither do six (Figure 7), eight (Figure 8), ten (Figure 9) or twelve (Figure 10): Feistel Nonlinearity Distribution One Table, Six Rounds Fig. 7 NL Expect Found 296 0 1 316 0 1 324 0 1 332 0 1 336 0 2 352 0 1 372 0 1 384 0 2 388 0 3 392 0 1 394 0 1 396 0 2 398 0 1 400 0 3 402 0 1 404 0 10 406 0 0 408 0 23 410 0 2 412 0 22 414 0 6 416 0 22 418 0 14 420 0 46 422 1 25 424 2 85 426 5 21 428 10 114 430 19 70 432 37 192 434 68 138 436 124 386 438 218 269 440 376 641 442 622 662 444 974 1197 446 1403 1238 448 1782 1612 450 1878 1394 452 1485 1151 454 763 492 456 208 133 458 22 13 460 1 0 462 0 0 Feistel Nonlinearity Distribution One Table, Eight Rounds Fig. 8 NL Expect Found 346 0 1 370 0 1 372 0 1 402 0 2 404 0 3 406 0 3 408 0 6 410 0 1 412 0 6 414 0 10 416 0 17 418 0 18 420 0 30 422 1 36 424 2 38 426 5 53 428 10 74 430 19 95 432 37 137 434 68 194 436 124 261 438 218 406 440 376 519 442 622 808 444 974 1080 446 1403 1365 448 1782 1569 450 1878 1527 452 1485 1063 454 763 519 456 208 136 458 22 21 460 1 0 462 0 0 Feistel Nonlinearity Distribution One Table, Ten Rounds Fig. 9 NL Expect Found 362 0 1 396 0 1 398 0 2 400 0 1 402 0 1 404 0 0 406 0 4 408 0 3 410 0 10 412 0 10 414 0 11 416 0 28 418 0 16 420 0 25 422 1 39 424 2 63 426 5 63 428 10 78 430 19 116 432 37 155 434 68 247 436 124 314 438 218 380 440 376 584 442 622 811 444 974 1034 446 1403 1353 448 1782 1556 450 1878 1428 452 1485 1046 454 763 480 456 208 126 458 22 14 460 1 0 462 0 0 Feistel Nonlinearity Distribution One Table, Twelve Rounds Fig. 10 NL Expect Found 398 0 1 400 0 3 402 0 2 404 0 2 406 0 6 408 0 4 410 0 8 412 0 12 414 0 13 416 0 12 418 0 30 420 0 23 422 1 34 424 2 64 426 5 62 428 10 96 430 19 113 432 37 179 434 68 234 436 124 327 438 218 462 440 376 597 442 622 797 444 974 1054 446 1403 1347 448 1782 1467 450 1878 1450 452 1485 1011 454 763 450 456 208 128 458 22 12 460 1 0 462 0 0 The twelve rounds with a fixed table of Figure 10 is about as good as it gets. The distribution from a single table never gets close enough to the ideal distribution to make a chi-square computation reasonable. Feistel Mixing with Random Multiple Tables On the other hand, things improve if we have a new table every couple of rounds (Figure 11). (There would seem to be little advantage in having a unique table for each round, since only half of the block would be affected by that table.) Feistel Nonlinearity Distribution Two Tables, Four Rounds Fig. 11 NL Expect Found 320 0 3 352 0 39 360 0 2 364 0 1 368 0 6 382 0 1 384 0 156 386 0 1 388 0 3 390 0 0 392 0 225 394 0 1 396 0 4 398 0 1 400 0 6 402 0 1 404 0 13 406 0 3 408 0 22 410 0 4 412 0 42 414 0 2 416 0 1951 418 0 3 420 0 81 422 1 5 424 2 143 426 5 13 428 10 276 430 19 29 432 37 853 434 68 47 436 124 788 438 218 90 440 376 3175 442 622 98 444 974 805 446 1403 130 448 1782 833 450 1878 33 452 1485 95 454 763 10 456 208 5 458 22 1 460 1 0 462 0 0 While hardly ideal, we can see that the results in Figure 11 are somewhat better than the same number of rounds with a single table (Figure 6). But things perk up rather nicely with three tables and six rounds as shown in Figure 12. Feistel Nonlinearity Distribution Three Tables, Six Rounds Fig. 12 NL Expect Found Chi-Sq DF 420 0 0 } 422 1 3 } 424 2 3 } 426 5 3 } 428 10 10 } 0.056 0 430 19 13 1.950 1 432 37 40 2.194 2 434 68 59 3.385 3 436 124 123 3.393 4 438 218 200 4.879 5 440 376 381 4.946 6 442 622 633 5.140 7 444 974 990 5.403 8 446 1403 1390 5.523 9 448 1782 1780 5.526 10 450 1878 1952 8.441 11 452 1485 1441 9.745 12 454 763 754 9.851 13 456 208 208 9.851 14 458 22 17 } 460 1 0 } 462 0 0 } 11.417 15 With 15 "degrees of freedom," the chi-square value of 11.417 in Figure 11 is very believable. The critical chi-square values for DF = 15 are given in Figure 13. Chi-Square Critical Values, for DF = 15 Fig. 13 1% 5% 25% 50% 75% 95% 99% 5.2293 7.2609 11.0365 14.3389 18.2451 24.9958 30.5779 And another couple of rounds with yet another table continues the success story in Figure 14. Feistel Nonlinearity Distribution Four Tables, Eight Rounds Fig. 14 NL Expect Found Chi-Sq DF 410 0 1 } 412 0 0 } 414 0 0 } 416 0 0 } 418 0 1 } 420 0 2 } 422 1 2 } 424 2 1 } 426 5 4 } 428 10 10 } 0.500 0 430 19 19 0.500 1 432 37 30 1.824 2 434 68 65 1.957 3 436 124 119 2.158 4 438 218 222 2.232 5 440 376 375 2.234 6 442 622 634 2.466 7 444 974 982 2.532 8 446 1403 1365 3.561 9 448 1782 1750 4.135 10 450 1878 1928 5.467 11 452 1485 1504 5.710 12 454 763 750 5.931 13 456 208 212 6.008 14 458 22 23 } 460 1 1 } 462 0 0 } 6.052 15 While the chi-square value of 6.052 may seem small, this means the distribution is close to the ideal. When comparing distributions, we are rarely worried about a chi-square value that seems "too good," for the alternative is far *less* believable: Either we have had the good luck of sampling a close distribution so it looks better than it should, or we have had the absolutely unbelievable luck of sampling a fundamentally different distribution so it looks not just good, but actually *too* good. Thus, these tests are basically "one-tail" statistical tests. Conclusions Feistel mixing seeks to produce a block cipher which is double the size of a smaller "block cipher" component; in this respect, it is similar to the Mixing constructions measured earlier. When using a single fixed table, it appears that Feistel mixing is never good enough, no matter how many rounds are applied. But even 6 rounds of Feistel mixing with 3 *different* tables apparently does approximate the nonlinearity of a larger substitution table. These experiments show an ultimately successful result using random invertible tables. Thus, the experiments give no indication that Feistel mixing requires the ideal tables that so much of the literature seems directed at producing. In contrast to the unending search for the qualities of an ideal table, it appears that we can produce an effective Feistel cipher by choosing tables "at random" (that is, we can construct tables based on the cipher key). Note that Differential and Linear Cryptanalysis style attacks generally depend upon knowledge of the contents of the tables, knowledge which is unavailable when keyed tables are used. When compared to the Mixing and VSBC constructions investigated previously, the problem with Feistel mixing is that, by itself, it only doubles the size of the nonlinear component, and we assume that repeated doubling requires exponential effort: That is, if 6 Feistel rounds are required to take random 8-bit tables to a 16-bit emulated table, 6 rounds of the emulation (36 total rounds) should be required to get a 32-bit table, and 216 total rounds should be required for a 64-bit cipher. (It would be interesting to check these assumptions on modern fast equipment.) Now, we know that DES (an emulated 64-bit table) does *not* require exponential effort. But Feistel mixing is *not* the only form of mixing in DES. We avoid addressing the other structures, because they do not scale well, and because we wish to examine each contribution as independently as possible. Here we have measured the Feistel mixing alone, which we assume requires effort exponential with the block size. In contrast, Mixing constructions require only effort proportional to n log n, and VSBC constructions require only effort linear with block size. Thus, these alternative constructions offer the possibility of far more efficient cryptographic mixing. References and Bibliography [AY82] Ayoub, F. 1982. Probabilistic completeness of substitution-permutation encryption networks. IEE Proceedings, Part E. 129(5): 195-199. [DAE94] Daemen, J., R. Govaerts and J. Vandewalle. 1994. Correlation Matrices. Fast Software Encryption. 275-285. [FOR88] Forre, R. 1988. The Strict Avalanche Criterion: Spectral Properties of Boolean Functions and an Extended Definition. Advances in Cryptology -- CRYPTO '88. 450-468. [GOR82] Gordon, J. and H. Retkin. 1982. Are Big S-Boxes Best? Cryptography. Proceedings of the Workshop on Cryptography, Burg Feuerstein, Germany, March 29-April 2, 1982. 257-262. [HEY94] Heys, H. and S. Tavares. 1994. On the security of the CAST encryption algorithm. Canadian Conference on Electrical and Computer Engineering. Halifax, Nova Scotia, Canada, Sept. 1994. 332-335. [HEY95] Heys, H. and S. Tavares. 1995. Known plaintext cryptanalysis of tree-structured block ciphers. Electronics Letters. 31(10): 784-785. [MEI89] Meier, W. and O. Staffelbach. 1989. Nonlinearity Criteria for Cryptographic Functions. Advances in Cryptology -- Eurocrypt '89. 549-562. [MIR97] Mirza, F. 1997. Linear and S-Box Pairs Cryptanalysis of the Data Encryption Standard. [OC91] O'Connor, L. 1991. Enumerating nondegenerate permutations. Advances in Cryptology -- Eurocrypt '91. 368-377. [OC93] O'Connor, L. 1993. On the Distribution Characteristics in Bijective Mappings. Advances in Cryptology -- EUROCRYPT '93. 360-370. [PIE88] Pieprzyk, J. and G. Finkelstein. 1988. Towards effective nonlinear cryptosystem design. IEE Proceedings, Part E. 135(6): 325-335. [PIE89] Pieprzyk, J. and G. Finkelstein. 1989. Permutations that Maximize Non-Linearity and Their Cryptographic Significance. Computer Security in the Age of Information. 63-74. [PIE89B] Pieprzyk, J. 1989. Non-linearity of Exponent Permutations. Advances in Cryptology -- EUROCRYPT '89. 80-92. [PIE93] Pieprzyk, J., C. Charnes and J. Seberry. 1993. Linear Approximation Versus Nonlinearity. Proceedings of the Workshop on Selected Areas in Cryptography (SAC '94). 82-89. [PRE90] Preneel, B., W. Van Leekwijck, L. Van Linden, R. Govaerts and J. Vandewalle. 1990. Propagation Characteristics of Boolean Functions. Advances in Cryptology -- Eurocrypt '90. 161-173. [RUE86] Rueppel, R. 1986. Analysis and Design of Stream Ciphers. Springer-Verlag. [XIO88] Xiao, G-Z. and J. Massey. 1988. A Spectral Characterization of Correlation-Immune Combining Functions. IEEE Transactions on Information Theory. 34(3): 569-571. [YOU95] Youssef, A. and S. Tavares. 1995. Resistance of Balanced S-boxes to Linear and Differential Cryptanalysis. Information Processing Letters. 56: 249-252. [YOU95B] Youssef, A. and S. Tavares. 1995. Number of Nonlinear Regular S-boxes. Electronics Letters. 31(19): 1643-1644. [ZHA95] Zhang, X. and Y. Zheng. 1995. GAC -- the Criterion for Global Avalanche Characteristics of Cryptographic Functions. Journal for Universal Computer Science. 1(5): 316-333. --- Terry Ritter ritter@io.com http://www.io.com/~ritter/ The Crypto Glossary: http://www.io.com/~ritter/GLOSSARY.HTM
Terry Ritter, his current address, and his top page.
Last updated: 1998-02-26