Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. A basic background in affine Boolean functions and their relationship to Walsh-Hadamard functions is presented. It is shown how distances to linear functions can be computed by hand, and that these values are exactly the same as those produced by a Fast Walsh Transform (FWT), which also can be computed by hand. A Pascal routine for the FWT is presented as a basis for practical nonlinearity measurement.
From: ritter@io.com (Terry Ritter) Newsgroups: sci.crypt Subject: Measuring Nonlinearity By Walsh Transform (Long!) Date: Tue, 13 Jan 1998 05:10:08 GMT Lines: 522 Message-ID: <34baf5ee.192497@news.io.com> MEASURING NONLINEARITY BY WALSH TRANSFORM Terry Ritter ritter@io.com Ritter Software Engineering http://www.io.com/~ritter/ 1998-01-12 Abstract Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. A basic background in affine Boolean functions and their relationship to Walsh-Hadamard functions is presented. It is shown how distances to linear functions can be computed by hand, and that these values are exactly the same as those produced by a Fast Walsh Transform (FWT), which also can be computed by hand. A Pascal routine for the FWT is presented as a basis for practical nonlinearity measurement. Affine Boolean Functions A Boolean function produces a single-bit result for each possible combination of values from perhaps many Boolean variables. The Boolean field consists of the values {0,1}, with XOR as "addition" and AND as "multiplication." We call a function "constant" when it has only one fixed value, which we often denote "k": (1) f = k I call a function "linear" when it fits the plane geometry equation of a line (here the constant is "b"): (2) y = mx + b Note that in the Boolean field, a constant value of '1' acts to reverse the sense (invert) the output value, while a constant of '0' does not affect the result at all. I call a function "affine" when it fits the form: (3) f = a * x + a * x + ... + a * x + a n-1 n-1 n-2 n-2 1 1 0 (Others reserve "affine" for Equation 3 *without* a constant, and use the term "linear" for Equation 3 *with* a constant.) In Boolean functions, each of the coefficients a[i] acts only to *select* each different variable x[i], and if we consider all a[i] a number, we have a unique ordering for affine Boolean functions as shown in Figure 1. Affine Functions Fig. 1 f0 = 0 f1 = 1 f2 = x[1] f3 = x[1] + 1 f4 = x[2] f5 = x[2] + 1 f6 = x[2] + x[1] f7 = x[2] + x[1] + 1 . . . Thus, we can write 16 different forms for 3 variables. But it is convenient to pair the functions which are the same except for the value of the constant, and then we have exactly 8 affine Boolean functions of 3 variables (with and without constant). Of course, any possible Boolean function of 3 variables *also* can be expressed in 8 terms, one for every possible combination of variable value. This is the "truth table" for a function, or the function expressed as the sequence of its output values. We can now express every possible 3-variable affine Boolean function, in order, as shown in Figure 2. The 3-Variable Affine Boolean Functions Fig. 2 affine truth table 1 1 1 1 1 1 1 1 1 x0 0 1 0 1 0 1 0 1 x1 0 0 1 1 0 0 1 1 x1+x0 0 1 1 0 0 1 1 0 x2 0 0 0 0 1 1 1 1 x2+ x0 0 1 0 1 1 0 1 0 x2+x1 0 0 1 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1 Correlation One way to measure a sort of "correlation" between two Boolean functions is to compare their truth tables and count the number of bits which differ; this is their Hamming distance. Then, since we expect about half the bit positions to differ (on average), we can subtract the expected difference: this gives what I am calling -- for lack of a better term -- "unexpected distance" (UD). The magnitude of the UD relates to how unexpected the distance is, while the sign indicates the direction. Consider two functions and their difference as shown in Figure 3: Distance to an Affine Function Fig. 3 f 1 0 0 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1 diff 1 1 1 1 0 1 0 1 We have a Hamming distance of 6 between these two functions. This is an unexpected distance of 6 - 4 = +2, which means that 2 more bits differ than we would expect. With some work, we can now compare a Boolean function to each possible affine Boolean function, and develop a distance to each. The less distance to an affine function, the more "linear" the measured function must be (see Figure 4). Distance to Each 3-Variable Affine Boolean Function Fig. 4 affine truth table distance 1 1 1 1 1 1 1 1 1 4 x0 0 1 0 1 0 1 0 1 4 x1 0 0 1 1 0 0 1 1 6 x1+x0 0 1 1 0 0 1 1 0 6 x2 0 0 0 0 1 1 1 1 4 x2+ x0 0 1 0 1 1 0 1 0 4 x2+x1 0 0 1 1 1 1 0 0 2 x2+x1+x0 0 1 1 0 1 0 0 1 6 f 1 0 0 1 1 1 0 0 So the minimum distance from f to an affine function is 2, making the closest direct function x2+x1. *But*, if we complement those functions which have a distance of 6 (that is, add a constant '1' term in the affine equation) we get the values shown in Figure 5. Distance to Complement Affine Functions Fig. 5 complement table distance x1 1 1 0 0 1 1 0 0 2 x1+x0 1 0 0 1 1 0 0 1 2 x2+x1+x0 1 0 0 1 0 1 1 0 2 f 1 0 0 1 1 1 0 0 This *could* mean that we need to consider the distance from f to the closest affine *complement* as well as the non-complement. But this is unnecessary if we use the absolute value of the unexpected distance, as shown in Figure 6. Unexpected Distance to Affine Boolean Function Fig. 6 affine truth table distance ud 1 1 1 1 1 1 1 1 1 4 0 x0 0 1 0 1 0 1 0 1 4 0 x1 0 0 1 1 0 0 1 1 6 +2 x1+x0 0 1 1 0 0 1 1 0 6 +2 x2 0 0 0 0 1 1 1 1 4 0 x2+ x0 0 1 0 1 1 0 1 0 4 0 x2+x1 0 0 1 1 1 1 0 0 2 -2 x2+x1+x0 0 1 1 0 1 0 0 1 6 +2 f 1 0 0 1 1 1 0 0 When we have a balanced function f, the distances to affine functions are always *even*. This is because the affine functions themselves are balanced, and if we change one bit in a balanced function, we must change at least one other bit to keep it balanced. If the first change increased the distance by 1, the next bit change can either increase it again by 1, or return to the original value. Balanced functions always will be at even distances from each other. This issue will arise again in the nonlinearity of the balanced Boolean functions which occur in invertible substitutions. The Walsh-Hadamard Transform A Hadamard matrix H is an n x n matrix with all entries +1 or -1, such that all rows are orthogonal and all columns are orthogonal (see, for example, [HED78]). The usual development (see, for example [SCH87]) starts with a defined 2 x 2 Hadamard matrix H2 which is ((1,1),(1,-1)). Each step consists of multiplying each element in H2 by the previous matrix, thus negating all elements in the bottom-right entry, as shown in Figure 7. Hadamard Matrix Development Fig. 7 H2 = | 1 1 | H4 = H2 * H2 = | H2 H2 | | 1 -1 | | H2 -H2 | H4 = | | 1 1 | | 1 1 | | = | 1 1 1 1 | | | 1 -1 | | 1 -1 | | | 1 -1 1 -1 | | | | 1 1 -1 -1 | | | 1 1 | |-1 -1 | | | 1 -1 -1 1 | | | 1 -1 | |-1 1 | | H8 = | H4 H4 | = | 1 1 1 1 1 1 1 1 | | H4 -H4 | | 1 -1 1 -1 1 -1 1 -1 | | 1 1 -1 -1 1 1 -1 -1 | | 1 -1 -1 1 1 -1 -1 1 | | 1 1 1 1 -1 -1 -1 -1 | | 1 -1 1 -1 -1 1 -1 1 | | 1 1 -1 -1 -1 -1 1 1 | | 1 -1 -1 1 -1 1 1 -1 | Now compare H8 from this strange Hadamard development to the affine functions from Figure 2, shown again as Figure 8: The 3-Variable Affine Boolean Functions Fig. 8 c 0 0 0 0 0 0 0 0 x0 0 1 0 1 0 1 0 1 x1 0 0 1 1 0 0 1 1 x1+x0 0 1 1 0 0 1 1 0 x2 0 0 0 0 1 1 1 1 x2+ x0 0 1 0 1 1 0 1 0 x2+x1 0 0 1 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1 So if we map the values in the affine truth table: {0,1} -> {1,-1}, we find *the* *same* *functions* as in the Hadamard development. These are the Walsh functions, and here both developments produce the same order, which is called "natural" or "Hadamard." (There are two other canonical orderings for Walsh functions, which we need not get into here.) Walsh functions have fast transforms which reduce the cost of correlation computations from n*n to n log n, a very substantial reduction. A Fast Walsh Transform It is easy to do a Fast Walsh Transform by hand. (Well, I say "easy," then always struggle when I actually do it.) Let's do the FWT of function f: (1 0 0 1 1 1 0 0): First note that f has a binary power length, as required. Next, each pair of elements is modified by an "in-place butterfly"; that is, the values in each pair produce two results which replace the original pair, wherever they were originally located. The left result will be the two values added; the right will be the first less the second. That is, (4) (a',b') = (a+b, a-b) So for the values (1,0), we get (1+0, 1-0) which is just (1,1). We start out pairing adjacent elements, then every other element, then every 4th element, and so on until the correct pairing is impossible, as shown in Figure 8. An 8-Element Fast Walsh Transform (FWT) Fig. 8 original 1 0 0 1 1 1 0 0 ^---^ ^---^ ^---^ ^---^ first 1 1 1 -1 2 0 0 0 ^-------^ ^-------^ ^-------^ ^-------^ second 2 0 0 2 2 0 2 0 ^---------------^ ^---------------^ ^---------------^ ^---------------^ final 4 0 2 2 0 0 -2 2 Now compare these results to the UD values we found in Fig. 4: Unexpected Distance to the Affine Functions Fig. 9 affine ud 1 0 x0 0 x1 +2 x1+x0 +2 x2 0 x2+ x0 0 x2+x1 -2 x2+x1+x0 +2 Note that all FWT elements -- after the zeroth -- map the U.D. results *exactly* in both magnitude and sign, *and* in the exact same order. (This ordering means that the binary index of any result is also the recipe for expressing the affine function being compared in that position, as shown in Figure 2.) The zeroth element in the FWT (here 4) is a special case and is the number of 1-bits in the function when we use the real values {0,1} to represent the function. When measuring balanced functions of a particular size, the zeroth entry should be obvious. Thus, to find the distance from any balanced function to the closest affine function, compute the FWT of the given function expressed as the real values {0,1}, then find the *maximum* absolute value of all unexpected distance elements past the zeroth element. A Fast Walsh Transform Routine The Fast Walsh Transform by hand is automated in the Borland Pascal listing of Figure 10. A Fast Walsh Transform (FWT) in Pascal Fig. 10 TYPE Lwd = LongInt; LintArray = ARRAY[ 0..16380 ] of LONGINT; PROCEDURE LintHadFmSeqWalsh( VAR DatLintAr; lastel: Lwd ); { Hadamard Walsh from sequential data, in-place } VAR Dat: LintArray ABSOLUTE DatLintAr; a, b: LONGINT; stradwid, { distance between pair of elements } blockstart, block, pair, el1, el2: Lwd; BEGIN stradwid := 1; blockstart := lastel; REPEAT el1 := 0; blockstart := blockstart DIV 2; FOR block := blockstart DOWNTO 0 DO BEGIN el2 := el1 + stradwid; FOR pair := 0 TO PRED(stradwid) DO BEGIN a := Dat[ el1 ]; b := Dat[ el2 ]; Dat[ el1 ] := a + b; Dat[ el2 ] := a - b; Inc( el1 ); Inc( el2 ); END; el1 := el2; END; stradwid := (stradwid + stradwid) AND lastel; UNTIL (stradwid = 0); END; {LintHadFmSeqWalsh} LintHadFmSeqWalsh takes an array of 32-bit integers, and changes the array data into the Walsh-Hadamard transform of that data. For nonlinearity measures, the input data are {0,1} or {1,-1}; the results are potentially bipolar in either case. (The "lastel" parameter is the last index in the data array which starts at index 0; it is thus always 2**n - 1 for some n. The ABSOLUTE attribute forces Borland Pascal to treat the parameter as a LongInt array of arbitrary size.) Nonlinearity Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. With the FWT we compute the "unexpected distance," or distance away from the expected difference. By finding the largest absolute value, we can find the closest "linear" function. This is, in a sense, a *linearity* measure. But we want the *non* linearity, and this is half the number of bits in the function, less the absolute value of the unexpected distance. So we can scan the FWT results either for the maximum absolute value first, or calculate the nonlinearity for each, and take the minimum. Nonlinearity is always positive, and also even if we have a balanced function. It is common to consider a Boolean function as consisting of the real values {0,1}, but it is *also* common to use the transformation x (5) x' = -1 where x is {0,1}. This transforms {0,1} -> {1,-1}, although we can do the exact same thing with: (6) x' = 1 - 2x This transformation has some implications: Using real values {1,-1} doubles the magnitude and changes the sign of the FWT results, but can simplify nonlinearity for unbalanced functions, because the zeroth term need not be treated specially. But if the Boolean function is balanced, as it will be in the typical invertible substitution table, the zeroth element need not be used at all, so using real values {1,-1} seems to provide no particular benefit in this application. Summary The basic concepts of affine Boolean functions have been presented, and a correspondence to Walsh-Hadamard functions illustrated. A Fast Walsh Transform was introduced and explained. Hopefully this will help those seeking to measure nonlinearity, and so help make this measurement better understood and more widely used. References and Bibliography [AY82] Ayoub, F. 1982. Probabilistic completeness of substitution-permutation encryption networks. IEE Proceedings, Part E. 129(5): 195-199. [DAE94] Daemen, J., R. Govaerts and J. Vandewalle. 1994. Correlation Matrices. Fast Software Encryption. 275-285. [FOR88] Forre, R. 1988. The Strict Avalanche Criterion: Spectral Properties of Boolean Functions and an Extended Definition. Advances in Cryptology -- CRYPTO '88. 450-468. [GOR82] Gordon, J. and H. Retkin. 1982. Are Big S-Boxes Best? Cryptography. Proceedings of the Workshop on Cryptography, Burg Feuerstein, Germany, March 29-April 2, 1982. 257-262. [HED78] Hedayat, A. and W. Wallis. 1978. Hadamard Matrices and their Applications. The Annals of Statistics. 6(6): 1184-1238. [HEY94] Heys, H. and S. Tavares. 1994. On the security of the CAST encryption algorithm. Canadian Conference on Electrical and Computer Engineering. Halifax, Nova Scotia, Canada, Sept. 1994. 332-335. [HEY95] Heys, H. and S. Tavares. 1995. Known plaintext cryptanalysis of tree-structured block ciphers. Electronics Letters. 31(10): 784-785. [MEI89] Meier, W. and O. Staffelbach. 1989. Nonlinearity Criteria for Cryptographic Functions. Advances in Cryptology -- Eurocrypt '89. 549-562. [MIR97] Mirza, F. 1997. Linear and S-Box Pairs Cryptanalysis of the Data Encryption Standard. [OC91] O'Connor, L. 1991. Enumerating nondegenerate permutations. Advances in Cryptology -- Eurocrypt '91. 368-377. [OC93] O'Connor, L. 1993. On the Distribution Characteristics in Bijective Mappings. Advances in Cryptology -- EUROCRYPT '93. 360-370. [PIE88] Pieprzyk, J. and G. Finkelstein. 1988. Towards effective nonlinear cryptosystem design. IEE Proceedings, Part E. 135(6): 325-335. [PIE89] Pieprzyk, J. and G. Finkelstein. 1989. Permutations that Maximize Non-Linearity and Their Cryptographic Significance. Computer Security in the Age of Information. 63-74. [PIE89B] Pieprzyk, J. 1989. Non-linearity of Exponent Permutations. Advances in Cryptology -- EUROCRYPT '89. 80-92. [PIE93] Pieprzyk, J., C. Charnes and J. Seberry. 1993. Linear Approximation Versus Nonlinearity. Proceedings of the Workshop on Selected Areas in Cryptography (SAC '94). 82-89. [PRE90] Preneel, B., W. Van Leekwijck, L. Van Linden, R. Govaerts and J. Vandewalle. 1990. Propagation Characteristics of Boolean Functions. Advances in Cryptology -- Eurocrypt '90. 161-173. [RUE86] Rueppel, R. 1986. Analysis and Design of Stream Ciphers. Springer-Verlag. [SCH86] Schroeder, M. 1986. Number Theory in Science and Communications. Springer-Verlag. [SCH87] Schroeder, M. and N. Sloane. 1987. New Permutation Codes Using Hadamard Unscrambling. IEEE Transactions on Information Theory. IT-33(1): 144-145. [XIO88] Xiao, G-Z. and J. Massey. 1988. A Spectral Characterization of Correlation-Immune Combining Functions. IEEE Transactions on Information Theory. 34(3): 569-571. [YOU95] Youssef, A. and S. Tavares. 1995. Resistance of Balanced S-boxes to Linear and Differential Cryptanalysis. Information Processing Letters. 56: 249-252. [YOU95B] Youssef, A. and S. Tavares. 1995. Number of Nonlinear Regular S-boxes. Electronics Letters. 31(19): 1643-1644. [ZHA95] Zhang, X. and Y. Zheng. 1995. GAC -- the Criterion for Global Avalanche Characteristics of Cryptographic Functions. Journal for Universal Computer Science. 1(5): 316-333. Many Walsh-Hadamard references are available in my Walsh-Hadamard literature review: http://www.io.com/~ritter/RES/WALHAD.HTM --- Terry Ritter ritter@io.com http://www.io.com/~ritter/ The Crypto Glossary: http://www.io.com/~ritter/GLOSSARY.HTM
Last updated: 1998-01-15