A detailed discussion of cryptographic nonlinearity, what it means and how it is computed, with active JavaScript panels to perform the computation.
Nonlinearity is the number of bits bits which must change in the truth table of a Boolean function to reach the closest affine function. If we believe that cryptosystems based on linear or affine functions are inherently weak, the ability to measure nonlinearity is the ability to measure one form of strength.
Nonlinearity measurement is particularly useful to quantify the strength of invertible substitution tables. This is important when pre-defined tables are a part of a cipher definition. But nonlinearity measurement can be even more important in the context of scalable ciphers: When ciphers can be down to experimental size, it becomes possible to talk about the overall nonlinearity (for each key) of the cipher itself. This is far more information than we usually have on cipher designs.
A Boolean function produces a single-bit result for each possible combination of values from perhaps many Boolean variables. The Boolean field consists of the values {0,1}, with XOR as "addition" and AND as "multiplication."
An
affine Boolean function
has the form:
f = anxn + an-1xn-1 + ... + a1x1 + a0
In the Boolean field, a constant or a0 value of '1' inverts or reverses the result, while a constant of '0' has no effect. The coefficients ai simply enable or disable the associated variable xi. And if we consider the collected coefficients to be a counting binary value, we have a unique ordering for affine Boolean functions:
Affine Boolean Functions f0 = 0*x[2] + 0*x[1] + 0 = 0 f1 = 0*x[2] + 0*x[1] + 1 = 1 f2 = 0*x[2] + 1*x[1] + 0 = x[1] f3 = 0*x[2] + 1*x[1] + 1 = x[1] + 1 f4 = 1*x[2] + 0*x[1] + 0 = x[2] f5 = 1*x[2] + 0*x[1] + 1 = x[2] + 1 f6 = 1*x[2] + 1*x[1] + 0 = x[2] + x[1] f7 = 1*x[2] + 1*x[1] + 1 = x[2] + x[1] + 1 . . .
In this way, we can write 16 different forms for 3 variables. But it is convenient to pair the functions which are the same except for the value of the constant, and then we have exactly 8 affine Boolean functions of 3 variables. Each of these has a particular value for every possible combination of variable value, which we can show in a truth table:
The 3-Variable Affine Boolean Functions affine truth table 1 1 1 1 1 1 1 1 1 x0 0 1 0 1 0 1 0 1 x1 0 0 1 1 0 0 1 1 x1+x0 0 1 1 0 0 1 1 0 x2 0 0 0 0 1 1 1 1 x2+ x0 0 1 0 1 1 0 1 0 x2+x1 0 0 1 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1
One way to measure a sort of "correlation" between two Boolean functions is to compare their truth tables and count the number of bits which differ; this is their Hamming distance.
Since we expect about half the bit positions to differ (on average), we can subtract that expected distance and come up with what I am calling -- for lack of a better term -- the "unexpected distance" (UD). The magnitude of the UD relates to just how unexpected the distance is, while the sign indicates the direction. Consider two functions and their difference:
Distance to an Affine Function f 1 0 0 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1 diff 1 1 1 1 0 1 0 1
Here we have a Hamming distance of 6 between the two functions. This is an unexpected distance or UD of 6 - 4 = +2, which means that 2 more bits differ than we would expect.
Another way to compute Boolean correlation is to accumulate the bits of one function (as integers) by addition or subtraction as selected by the other function. For example:
Distance to an Affine Function f 1 0 0 1 1 1 0 0 x2+x1+x0 + - - + - + + - (operation select) accum +1 -0 -0 +1 -1 +1 +0 -0 = +2
This technique yields the UD value directly.
With some work, we can now compare a Boolean function to each possible affine Boolean function, and develop both a distance and an unexpected distance to each:
Unexpected Distance to Affine Boolean Function affine truth table distance ud c 0 0 0 0 0 0 0 0 4 0 x0 0 1 0 1 0 1 0 1 4 0 x1 0 0 1 1 0 0 1 1 6 +2 x1+x0 0 1 1 0 0 1 1 0 6 +2 x2 0 0 0 0 1 1 1 1 4 0 x2+ x0 0 1 0 1 1 0 1 0 4 0 x2+x1 0 0 1 1 1 1 0 0 2 -2 x2+x1+x0 0 1 1 0 1 0 0 1 6 +2 f 1 0 0 1 1 1 0 0
Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. But every affine Boolean function also has a complement affine function which has every truth table bit value reversed. This means that no function possibly can be more than half its length in bits away from both an affine Boolean function and its complement. So a zero UD value is not only what we expect, it is in fact the best we can possibly do.
A non-zero UD value is that much closer to some affine function, and that much less nonlinear. So the nonlinearity value is half the length of the function, less the maximum absolute value of the unexpected distance to each affine function.
The function f in the previous section has a length of 8 bits,
and an absolute value maximum unexpected distance of 2. This is a
nonlinearity of
A Hadamard matrix H is an n x n matrix with all entries +1 or -1, such that all rows are orthogonal and all columns are orthogonal (see, for example, [HED78]).
The usual development (see, for example [SCH87]) starts with a defined 2 x 2 Hadamard matrix H2 which is ((1,1),(1,-1)). Each step consists of multiplying each element in H2 by the previous matrix, thus negating all elements in the bottom-right entry:
Hadamard Matrix Development H2 = | 1 1 | H4 = H2 * H2 = | H2 H2 | | 1 -1 | | H2 -H2 | H4 = | | 1 1 | | 1 1 | | = | 1 1 1 1 | | | 1 -1 | | 1 -1 | | | 1 -1 1 -1 | | | | 1 1 -1 -1 | | | 1 1 | |-1 -1 | | | 1 -1 -1 1 | | | 1 -1 | |-1 1 | | H8 = | H4 H4 | = | 1 1 1 1 1 1 1 1 | | H4 -H4 | | 1 -1 1 -1 1 -1 1 -1 | | 1 1 -1 -1 1 1 -1 -1 | | 1 -1 -1 1 1 -1 -1 1 | | 1 1 1 1 -1 -1 -1 -1 | | 1 -1 1 -1 -1 1 -1 1 | | 1 1 -1 -1 -1 -1 1 1 | | 1 -1 -1 1 -1 1 1 -1 |
Now compare H8 from this strange Hadamard development to the affine functions:
The 3-Variable Affine Boolean Functions c 0 0 0 0 0 0 0 0 x0 0 1 0 1 0 1 0 1 x1 0 0 1 1 0 0 1 1 x1+x0 0 1 1 0 0 1 1 0 x2 0 0 0 0 1 1 1 1 x2+ x0 0 1 0 1 1 0 1 0 x2+x1 0 0 1 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1
So if we map the values in the affine truth table:
A Fast Walsh Transform (FWT) essentially computes the correlations which we have been calling the "unexpected distance" (UD). It does this by calculating the sum and difference of two elements at a time, in a sequence of particular pairings, each time replacing the elements with the calculated values.
It is easy to do a FWT by hand. (Well, I say "easy," then always struggle when I actually do it.) Let's do the FWT of function f: (1 0 0 1 1 1 0 0): First note that f has a binary power length, as required. Next, each pair of elements is modified by an "in-place butterfly"; that is, the values in each pair produce two results which replace the original pair, wherever they were originally located. The left result will be the two values added; the right result will be the first value less the second. That is,
(a',b') = (a+b, a-b)
So for the values (1,0), we get (1+0, 1-0) which is just (1,1). We start out pairing adjacent elements, then every other element, then every 4th element, and so on until the correct pairing is impossible:
An 8-Element Fast Walsh Transform (FWT) original 1 0 0 1 1 1 0 0 ^---^ ^---^ ^---^ ^---^ first 1 1 1 -1 2 0 0 0 ^-------^ ^-------^ ^-------^ ^-------^ second 2 0 0 2 2 0 2 0 ^---------------^ ^---------------^ ^---------------^ ^---------------^ final 4 0 2 2 0 0 -2 2
Now compare these results to the UD values we found earlier:
Unexpected Distance to the Affine Functions affine ud 1 0 x0 0 x1 +2 x1+x0 +2 x2 0 x2+ x0 0 x2+x1 -2 x2+x1+x0 +2
Note that all FWT elements -- after the zeroth -- map the U.D. results exactly in both magnitude and sign, and in the exact same order. (This ordering means that the binary index of any result is also the recipe for expressing the affine function being compared in that position.) The zeroth element in the FWT is the number of 1-bits in the function when we use the real values {0,1} to represent the function.
So to find the "unexpected distance" from any balanced function to every affine Boolean function, just compute the FWT. Clearly, the closest affine function has the absolute value maximum UD value of all the transformed elements past the zeroth. Just subtract this value from half the function length (which is the zeroth FWT value in a balanced function) to get the nonlinearity.
To understand how the FWT works, suppose we label each bit-value with a letter, and then perform a symbolic FWT:
An 8-Element Fast Walsh Transform (FWT) a b c d e f g h ^------^ ^------^ ^------^ ^------^ a+b a-b c+d c-d e+f e-f g+h g-h ^-------------^ ^-------------^ ^-------------^ ^-------------^ a+b a-b a+b a-b e+f e-f e+f e-f c+d c-d -c-d -c+d g+h g-h -g-h -g+h ^---------------------------^ ^---------------------------^ ^---------------------------^ ^---------------------------^ a+b a-b a+b a-b a+b a-b a+b a-b c+d c-d -c-d -c+d c+d c-d -c-d -c+d e+f e-f e+f e-f -e-f -e+f -e-f -e+f g+h g-h -g-h -g+h -g-h -g+h g+h g-h
Each of these columns is the symbolic description of one element in the FWT result. Since each uses the same input variables in the same order, we can represent the uniqueness of each result simply by the sign applied to each variable:
Symbolic FWT Results by Column a+b+c+d+e+f+g+h = + + + + + + + + a-b+c-d+e-f+g-h = + - + - + - + - a+b-c-d+e+f-g-h = + + - - + + - - a-b-c+d+e-f-g+h = + - - + + - - + a+b+c+d-e-f-g-h = + + + + - - - - a-b+c-d-e+f-g+h = + - + - - + - + a+b-c-d-e-f+g+h = + + - - - - + + a-b-c+d-e+f+g-h = + - - + - + + -
Which we can compare to:
The 3-Variable Affine Boolean Functions c 0 0 0 0 0 0 0 0 x0 0 1 0 1 0 1 0 1 x1 0 0 1 1 0 0 1 1 x1+x0 0 1 1 0 0 1 1 0 x2 0 0 0 0 1 1 1 1 x2+ x0 0 1 0 1 1 0 1 0 x2+x1 0 0 1 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1
So not only do we once again find the affine functions, we also find them implicit in a way appropriate for computing add / subtract correlations, thus producing UD values directly with high efficiency.
The fast transform by hand is automated in Borland Pascal:
TYPE Lwd = LongInt; LintArray = ARRAY[ 0..16380 ] of LONGINT; PROCEDURE LintHadFmSeqWalsh( VAR DatLintAr; lastel: Lwd ); { Hadamard Walsh from sequential data, in-place } VAR Dat: LintArray ABSOLUTE DatLintAr; a, b: LONGINT; stradwid, { distance between pair of elements } blockstart, block, pair, el1, el2: Lwd; BEGIN stradwid := 1; blockstart := lastel; REPEAT el1 := 0; blockstart := blockstart DIV 2; FOR block := blockstart DOWNTO 0 DO BEGIN el2 := el1 + stradwid; FOR pair := 0 TO PRED(stradwid) DO BEGIN a := Dat[ el1 ]; b := Dat[ el2 ]; Dat[ el1 ] := a + b; Dat[ el2 ] := a - b; Inc( el1 ); Inc( el2 ); END; el1 := el2; END; stradwid := (stradwid + stradwid) AND lastel; UNTIL (stradwid = 0); END; {LintHadFmSeqWalsh}
LintHadFmSeqWalsh takes an array of 32-bit integers, and changes the array data into the Walsh-Hadamard transform of that data. For nonlinearity measures, the input data are {0,1} or {1,-1}; the results are potentially bipolar in either case. (The "lastel" parameter is the last index in the data array which starts at index 0; it is thus always 2n - 1 for some n. The ABSOLUTE attribute forces Borland Pascal to treat the parameter as a LongInt array of arbitrary size.)
It is common to consider a Boolean function as consisting of the real values {0,1}, but it is also common to use the transformation
x' = (-1)x(negative 1 to the power x) where x is {0,1}. This transforms {0,1} into {1,-1}, and for some reason looks much cooler than doing the exact same thing with
x' = 1 - 2x
This transformation has some implications: Using real values {1,-1} doubles the magnitude and changes the sign of the FWT results, but can simplify nonlinearity for unbalanced functions, because the zeroth term need not be treated specially. But if the Boolean function is balanced, as it will be in the typical invertible substitution table, the zeroth element need not be used at all, so using real values {1,-1} seems to provide no particular benefit in this application.
An invertible
substitution table
is an array of values in which
any particular value can occur at most once. If the range of the
output values is the same as the input values, then every value
occurs in the table exactly once. Typically the table has a
power-of-2 number of elements, which is related to size in bits of
its input (and output) value. For example, an "8-bit" table has
Even these relatively small tables have remarkable
keying
potential. Each invertible table differs from every other only in
the arrangement of the values it holds, but there is typically an
incredible number of possible
permutations. A 2-bit
table with
Nonlinearity applies to Boolean functions, and so does not apply directly to substitution tables. But each output bit from such a table can be considered a Boolean function. So we can run through the table extracting all the bits in a given bit position, and then measure the nonlinearity of the function represented by those bits.
Clearly, if we measure a nonlinearity value for each output bit position, we do not have a single nonlinearity for the table. Several ways have been suggested to combine these values, including the sum or the average of all values. But for cryptographic use it may be more significant to collect the minimum nonlinearity over all the bit positions. This allows us to argue that no bit position in the table is weaker than the value we have. Since a table collects multiple Boolean functions, tables tend to be weaker than the average Boolean function of the same length. But the nonlinearity values for tables and sequences of the same length do tend to be similar and somewhat comparable.
There are no nonlinear 2-bit tables. We know this because there are exactly 6 balanced bit sequences of length 4, and each of those has a measured nonlinearity of zero. So there is no chance to build a nonlinear table by collecting those sequences.
Here are some coarse graphs of nonlinearity distributions at various table sizes:
Nonlinearity Distribution in 4-Bit Tables 0.6 | 0.5 | * * 0.4 | * * 0.3 | * * 0.2 | * * 0.1 | * * 0.0 | * * * Prob +--+--+--+-- 0 2 4 Nonlinearity
Nonlinearity Distribution in 5-Bit Tables 0.7 | * 0.6 | * 0.5 | * 0.4 | * 0.3 | * 0.2 | * * 0.1 | * * * 0.0 | * * * Prob +--+--+--+--+--+--+-- 0 2 4 6 8 10 Nonlinearity
Nonlinearity Distribution in 8-Bit Tables 0.35 | * 0.3 | * 0.25 | * * 0.2 | * * * 0.15 | * * * 0.1 | * * * * 0.05 | * * * * * 0.00 | * * * * * * * Prob +--+--+--+--+--+--+--+--+-- 92 96 100 104 Nonlinearity
Nonlinearity Distribution in 10-Bit Tables 0.2 | 0.175 | * * 0.15 | * * * 0.125 | * * * * 0.1 | * * * * * 0.075 | * * * * * * 0.05 | * * * * * * * * 0.025 | * * * * * * * * * * 0.00 | * * * * * * * * * * * Prob +--+--+--+--+--+--+--+--+--+--+--+--+-- 436 440 444 448 452 456 Nonlinearity
[AY82] Ayoub, F. 1982. Probabilistic completeness of substitution-permutation encryption networks. IEE Proceedings, Part E. 129(5): 195-199.
[DAE94] Daemen, J., R. Govaerts and J. Vandewalle. 1994. Correlation Matrices. Fast Software Encryption. 275-285.
[FOR88] Forre, R. 1988. The Strict Avalanche Criterion: Spectral Properties of Boolean Functions and an Extended Definition. Advances in Cryptology -- CRYPTO '88. 450-468.
[GOR82] Gordon, J. and H. Retkin. 1982. Are Big S-Boxes Best? Cryptography. Proceedings of the Workshop on Cryptography, Burg Feuerstein, Germany, March 29-April 2, 1982. 257-262.
[HED78] Hedayat, A. and W. Wallis. 1978. Hadamard Matrices and their Applications. The Annals of Statistics. 6(6): 1184-1238.
[HEY94] Heys, H. and S. Tavares. 1994. On the security of the CAST encryption algorithm. Canadian Conference on Electrical and Computer Engineering. Halifax, Nova Scotia, Canada, Sept. 1994. 332-335.
[HEY95] Heys, H. and S. Tavares. 1995. Known plaintext cryptanalysis of tree-structured block ciphers. Electronics Letters. 31(10): 784-785.
[MEI89] Meier, W. and O. Staffelbach. 1989. Nonlinearity Criteria for Cryptographic Functions. Advances in Cryptology -- Eurocrypt '89. 549-562.
[MIR97] Mirza, F. 1997. Linear and S-Box Pairs Cryptanalysis of the Data Encryption Standard.
[OC91] O'Connor, L. 1991. Enumerating nondegenerate permutations. Advances in Cryptology -- Eurocrypt '91. 368-377.
[OC93] O'Connor, L. 1993. On the Distribution Characteristics in Bijective Mappings. Advances in Cryptology -- EUROCRYPT '93. 360-370.
[PIE88] Pieprzyk, J. and G. Finkelstein. 1988. Towards effective nonlinear cryptosystem design. IEE Proceedings, Part E. 135(6): 325-335.
[PIE89] Pieprzyk, J. and G. Finkelstein. 1989. Permutations that Maximize Non-Linearity and Their Cryptographic Significance. Computer Security in the Age of Information. 63-74.
[PIE89B] Pieprzyk, J. 1989. Non-linearity of Exponent Permutations. Advances in Cryptology -- EUROCRYPT '89. 80-92.
[PIE93] Pieprzyk, J., C. Charnes and J. Seberry. 1993. Linear Approximation Versus Nonlinearity. Proceedings of the Workshop on Selected Areas in Cryptography (SAC '94). 82-89.
[PRE90] Preneel, B., W. Van Leekwijck, L. Van Linden, R. Govaerts and J. Vandewalle. 1990. Propagation Characteristics of Boolean Functions. Advances in Cryptology -- Eurocrypt '90. 161-173.
[RUE86] Rueppel, R. 1986. Analysis and Design of Stream Ciphers. Springer-Verlag.
[SCH86] Schroeder, M. 1986. Number Theory in Science and Communications. Springer-Verlag.
[SCH87] Schroeder, M. and N. Sloane. 1987. New Permutation Codes Using Hadamard Unscrambling. IEEE Transactions on Information Theory. IT-33(1): 144-145.
[XIO88] Xiao, G-Z. and J. Massey. 1988. A Spectral Characterization of Correlation-Immune Combining Functions. IEEE Transactions on Information Theory. 34(3): 569-571.
[YOU95] Youssef, A. and S. Tavares. 1995. Resistance of Balanced S-boxes to Linear and Differential Cryptanalysis. Information Processing Letters. 56: 249-252.
[YOU95B] Youssef, A. and S. Tavares. 1995. Number of Nonlinear Regular S-boxes. Electronics Letters. 31(19): 1643-1644.
[ZHA95] Zhang, X. and Y. Zheng. 1995. GAC -- the Criterion for Global Avalanche Characteristics of Cryptographic Functions. Journal for Universal Computer Science. 1(5): 316-333.
Other nonlinearity articles, often dealing with measurements on the new block ciphers I have been developing, are available in the Technical Articles section of my pages:
http://www.ciphersbyritter.com/index.html#TechnicalArticles
Also, many Walsh-Hadamard references are available in my Walsh-Hadamard literature review:
http://www.ciphersbyritter.com/RES/WALHAD.HTM
Last updated:1998-05-21