Active Boolean Function Nonlinearity Measurement in JavaScript

A detailed discussion of cryptographic nonlinearity, what it means and how it is computed, with active JavaScript panels to perform the computation.

A Ciphers By Ritter Page

Terry Ritter

Nonlinearity is the number of bits bits which must change in the truth table of a Boolean function to reach the closest affine function. If we believe that cryptosystems based on linear or affine functions are inherently weak, the ability to measure nonlinearity is the ability to measure one form of strength.

Nonlinearity measurement is particularly useful to quantify the strength of invertible substitution tables. This is important when pre-defined tables are a part of a cipher definition. But nonlinearity measurement can be even more important in the context of scalable ciphers: When ciphers can be down to experimental size, it becomes possible to talk about the overall nonlinearity (for each key) of the cipher itself. This is far more information than we usually have on cipher designs.

Affine Boolean Functions

A Boolean function produces a single-bit result for each possible combination of values from perhaps many Boolean variables. The Boolean field consists of the values {0,1}, with XOR as "addition" and AND as "multiplication."

An affine Boolean function has the form:

     f = a_nx_n + a_n-1x_n-1 + ... + a₁x₁ + a₀

In the Boolean field, a constant or a₀ value of '1' inverts or reverses the result, while a constant of '0' has no effect. The coefficients a_i simply enable or disable the associated variable x_i. And if we consider the collected coefficients to be a counting binary value, we have a unique ordering for affine Boolean functions:

   Affine Boolean Functions

      f0 =  0*x[2] + 0*x[1] + 0  =  0
      f1 =  0*x[2] + 0*x[1] + 1  =  1
      f2 =  0*x[2] + 1*x[1] + 0  =  x[1]
      f3 =  0*x[2] + 1*x[1] + 1  =  x[1] + 1
      f4 =  1*x[2] + 0*x[1] + 0  =  x[2]
      f5 =  1*x[2] + 0*x[1] + 1  =  x[2] + 1
      f6 =  1*x[2] + 1*x[1] + 0  =  x[2] + x[1]
      f7 =  1*x[2] + 1*x[1] + 1  =  x[2] + x[1] + 1
       . . .

In this way, we can write 16 different forms for 3 variables. But it is convenient to pair the functions which are the same except for the value of the constant, and then we have exactly 8 affine Boolean functions of 3 variables. Each of these has a particular value for every possible combination of variable value, which we can show in a truth table:

   The 3-Variable Affine Boolean Functions

     affine    truth table

          1    1  1  1  1  1  1  1  1
         x0    0  1  0  1  0  1  0  1
      x1       0  0  1  1  0  0  1  1
      x1+x0    0  1  1  0  0  1  1  0
   x2          0  0  0  0  1  1  1  1
   x2+   x0    0  1  0  1  1  0  1  0
   x2+x1       0  0  1  1  1  1  0  0
   x2+x1+x0    0  1  1  0  1  0  0  1

Unexpected Distance

One way to measure a sort of "correlation" between two Boolean functions is to compare their truth tables and count the number of bits which differ; this is their Hamming distance.

Since we expect about half the bit positions to differ (on average), we can subtract that expected distance and come up with what I am calling -- for lack of a better term -- the "unexpected distance" (UD). The magnitude of the UD relates to just how unexpected the distance is, while the sign indicates the direction. Consider two functions and their difference:

   Distance to an Affine Function

          f    1  0  0  1  1  1  0  0
   x2+x1+x0    0  1  1  0  1  0  0  1
       diff    1  1  1  1  0  1  0  1

Here we have a Hamming distance of 6 between the two functions. This is an unexpected distance or UD of 6 - 4 = +2, which means that 2 more bits differ than we would expect.

Another way to compute Boolean correlation is to accumulate the bits of one function (as integers) by addition or subtraction as selected by the other function. For example:

   Distance to an Affine Function

          f    1  0  0  1  1  1  0  0
   x2+x1+x0    +  -  -  +  -  +  +  -   (operation select)

      accum   +1 -0 -0 +1 -1 +1 +0 -0  =  +2

This technique yields the UD value directly.

With some work, we can now compare a Boolean function to each possible affine Boolean function, and develop both a distance and an unexpected distance to each:

   Unexpected Distance to Affine Boolean Function

     affine    truth table                 distance    ud

          c    0  0  0  0  0  0  0  0      4           0
         x0    0  1  0  1  0  1  0  1      4           0
      x1       0  0  1  1  0  0  1  1      6          +2
      x1+x0    0  1  1  0  0  1  1  0      6          +2
   x2          0  0  0  0  1  1  1  1      4           0
   x2+   x0    0  1  0  1  1  0  1  0      4           0
   x2+x1       0  0  1  1  1  1  0  0      2          -2
   x2+x1+x0    0  1  1  0  1  0  0  1      6          +2

          f    1  0  0  1  1  1  0  0

Nonlinearity

Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. But every affine Boolean function also has a complement affine function which has every truth table bit value reversed. This means that no function possibly can be more than half its length in bits away from both an affine Boolean function and its complement. So a zero UD value is not only what we expect, it is in fact the best we can possibly do.

A non-zero UD value is that much closer to some affine function, and that much less nonlinear. So the nonlinearity value is half the length of the function, less the maximum absolute value of the unexpected distance to each affine function.

The function f in the previous section has a length of 8 bits, and an absolute value maximum unexpected distance of 2. This is a nonlinearity of 4 - 2 = 2; so f has a nonlinearity of 2. Nonlinearity is always positive, and also even (divisible by 2) if we have a balanced function.

The Hadamard Matrix and Affine Functions

A Hadamard matrix H is an n x n matrix with all entries +1 or -1, such that all rows are orthogonal and all columns are orthogonal (see, for example, [HED78]).

The usual development (see, for example [SCH87]) starts with a defined 2 x 2 Hadamard matrix H2 which is ((1,1),(1,-1)). Each step consists of multiplying each element in H2 by the previous matrix, thus negating all elements in the bottom-right entry:

        Hadamard Matrix Development

   H2 = |  1  1 |    H4 = H2 * H2 = | H2  H2 |
        |  1 -1 |                   | H2 -H2 |

   H4 = | | 1  1 |  | 1  1 | | = | 1  1  1  1 |
        | | 1 -1 |  | 1 -1 | |   | 1 -1  1 -1 |
        |                    |   | 1  1 -1 -1 |
        | | 1  1 |  |-1 -1 | |   | 1 -1 -1  1 |
        | | 1 -1 |  |-1  1 | |

   H8 = | H4  H4 | = | 1  1  1  1  1  1  1  1 |
        | H4 -H4 |   | 1 -1  1 -1  1 -1  1 -1 |
                     | 1  1 -1 -1  1  1 -1 -1 |
                     | 1 -1 -1  1  1 -1 -1  1 |
                     | 1  1  1  1 -1 -1 -1 -1 |
                     | 1 -1  1 -1 -1  1 -1  1 |
                     | 1  1 -1 -1 -1 -1  1  1 |
                     | 1 -1 -1  1 -1  1  1 -1 |

Now compare H8 from this strange Hadamard development to the affine functions:

   The 3-Variable Affine Boolean Functions

                  c    0  0  0  0  0  0  0  0
                 x0    0  1  0  1  0  1  0  1
              x1       0  0  1  1  0  0  1  1
              x1+x0    0  1  1  0  0  1  1  0
           x2          0  0  0  0  1  1  1  1
           x2+   x0    0  1  0  1  1  0  1  0
           x2+x1       0  0  1  1  1  1  0  0
           x2+x1+x0    0  1  1  0  1  0  0  1

So if we map the values in the affine truth table: {0,1} -> {1,-1}, we find the same functions as in the Hadamard development. These are the Walsh functions, and here both developments produce the same order, which is called "natural" or "Hadamard." Walsh functions have fast transforms which reduce the cost of correlation computations from n*n to n log n, which can be a very substantial reduction.

The Fast Walsh-Hadamard Transform

A Fast Walsh Transform (FWT) essentially computes the correlations which we have been calling the "unexpected distance" (UD). It does this by calculating the sum and difference of two elements at a time, in a sequence of particular pairings, each time replacing the elements with the calculated values.

It is easy to do a FWT by hand. (Well, I say "easy," then always struggle when I actually do it.) Let's do the FWT of function f: (1 0 0 1 1 1 0 0): First note that f has a binary power length, as required. Next, each pair of elements is modified by an "in-place butterfly"; that is, the values in each pair produce two results which replace the original pair, wherever they were originally located. The left result will be the two values added; the right result will be the first value less the second. That is,

   (a',b') = (a+b, a-b)

So for the values (1,0), we get (1+0, 1-0) which is just (1,1). We start out pairing adjacent elements, then every other element, then every 4th element, and so on until the correct pairing is impossible:

   An 8-Element Fast Walsh Transform (FWT)

   original      1   0   0   1   1   1   0   0
                 ^---^   ^---^   ^---^   ^---^

      first      1   1   1  -1   2   0   0   0
                 ^-------^       ^-------^
                     ^-------^       ^-------^

     second      2   0   0   2   2   0   2   0
                 ^---------------^
                     ^---------------^
                         ^---------------^
                             ^---------------^

      final      4   0   2   2   0   0  -2   2

Now compare these results to the UD values we found earlier:

   Unexpected Distance to the Affine Functions

     affine     ud

          1     0
         x0     0
      x1       +2
      x1+x0    +2
   x2           0
   x2+   x0     0
   x2+x1       -2
   x2+x1+x0    +2

Note that all FWT elements -- after the zeroth -- map the U.D. results exactly in both magnitude and sign, and in the exact same order. (This ordering means that the binary index of any result is also the recipe for expressing the affine function being compared in that position.) The zeroth element in the FWT is the number of 1-bits in the function when we use the real values {0,1} to represent the function.

So to find the "unexpected distance" from any balanced function to every affine Boolean function, just compute the FWT. Clearly, the closest affine function has the absolute value maximum UD value of all the transformed elements past the zeroth. Just subtract this value from half the function length (which is the zeroth FWT value in a balanced function) to get the nonlinearity.

Understanding the FWT

To understand how the FWT works, suppose we label each bit-value with a letter, and then perform a symbolic FWT:

   An 8-Element Fast Walsh Transform (FWT)

    a      b      c      d      e      f      g      h
    ^------^      ^------^      ^------^      ^------^

   a+b    a-b    c+d    c-d    e+f    e-f    g+h    g-h

    ^-------------^             ^-------------^
           ^-------------^             ^-------------^

   a+b    a-b    a+b    a-b    e+f    e-f    e+f    e-f
   c+d    c-d   -c-d   -c+d    g+h    g-h   -g-h   -g+h

    ^---------------------------^
           ^---------------------------^
                  ^---------------------------^
                         ^---------------------------^

   a+b    a-b    a+b    a-b    a+b    a-b    a+b    a-b
   c+d    c-d   -c-d   -c+d    c+d    c-d   -c-d   -c+d
   e+f    e-f    e+f    e-f   -e-f   -e+f   -e-f   -e+f
   g+h    g-h   -g-h   -g+h   -g-h   -g+h    g+h    g-h

Each of these columns is the symbolic description of one element in the FWT result. Since each uses the same input variables in the same order, we can represent the uniqueness of each result simply by the sign applied to each variable:

   Symbolic FWT Results by Column

   a+b+c+d+e+f+g+h  =   +  +  +  +  +  +  +  +
   a-b+c-d+e-f+g-h  =   +  -  +  -  +  -  +  -
   a+b-c-d+e+f-g-h  =   +  +  -  -  +  +  -  -
   a-b-c+d+e-f-g+h  =   +  -  -  +  +  -  -  +
   a+b+c+d-e-f-g-h  =   +  +  +  +  -  -  -  -
   a-b+c-d-e+f-g+h  =   +  -  +  -  -  +  -  +
   a+b-c-d-e-f+g+h  =   +  +  -  -  -  -  +  +
   a-b-c+d-e+f+g-h  =   +  -  -  +  -  +  +  -

Which we can compare to:

   The 3-Variable Affine Boolean Functions

                   c    0  0  0  0  0  0  0  0
                  x0    0  1  0  1  0  1  0  1
               x1       0  0  1  1  0  0  1  1
               x1+x0    0  1  1  0  0  1  1  0
            x2          0  0  0  0  1  1  1  1
            x2+   x0    0  1  0  1  1  0  1  0
            x2+x1       0  0  1  1  1  1  0  0
            x2+x1+x0    0  1  1  0  1  0  0  1

So not only do we once again find the affine functions, we also find them implicit in a way appropriate for computing add / subtract correlations, thus producing UD values directly with high efficiency.

A Fast Walsh-Hadamard Transform Routine

The fast transform by hand is automated in Borland Pascal:

   TYPE
      Lwd = LongInt;
      LintArray = ARRAY[ 0..16380 ] of LONGINT;


   PROCEDURE LintHadFmSeqWalsh( VAR DatLintAr; lastel: Lwd );
      { Hadamard Walsh from sequential data, in-place }
      VAR
         Dat: LintArray ABSOLUTE DatLintAr;
         a, b: LONGINT;
         stradwid,  { distance between pair of elements }
         blockstart, block, pair, el1, el2: Lwd;
      BEGIN
      stradwid := 1;
      blockstart := lastel;
         REPEAT
         el1 := 0;
         blockstart := blockstart DIV 2;
         FOR block := blockstart DOWNTO 0 DO
            BEGIN
            el2 := el1 + stradwid;
            FOR pair := 0 TO PRED(stradwid) DO
               BEGIN
               a := Dat[ el1 ];
               b := Dat[ el2 ];
               Dat[ el1 ] := a + b;
               Dat[ el2 ] := a - b;
               Inc( el1 );  Inc( el2 );
               END;
            el1 := el2;
            END;
         stradwid := (stradwid + stradwid) AND lastel;
         UNTIL (stradwid = 0);
      END; {LintHadFmSeqWalsh}

LintHadFmSeqWalsh takes an array of 32-bit integers, and changes the array data into the Walsh-Hadamard transform of that data. For nonlinearity measures, the input data are {0,1} or {1,-1}; the results are potentially bipolar in either case. (The "lastel" parameter is the last index in the data array which starts at index 0; it is thus always 2^{n - 1} for some n. The ABSOLUTE attribute forces Borland Pascal to treat the parameter as a LongInt array of arbitrary size.)

Using {0,1} Versus {1,-1}

It is common to consider a Boolean function as consisting of the real values {0,1}, but it is also common to use the transformation

   x' = (-1)^x

(negative 1 to the power x) where x is {0,1}. This transforms {0,1} into {1,-1}, and for some reason looks much cooler than doing the exact same thing with

   x' = 1 - 2x

This transformation has some implications: Using real values {1,-1} doubles the magnitude and changes the sign of the FWT results, but can simplify nonlinearity for unbalanced functions, because the zeroth term need not be treated specially. But if the Boolean function is balanced, as it will be in the typical invertible substitution table, the zeroth element need not be used at all, so using real values {1,-1} seems to provide no particular benefit in this application.

Nonlinearity in Invertible Substitution Tables

An invertible substitution table is an array of values in which any particular value can occur at most once. If the range of the output values is the same as the input values, then every value occurs in the table exactly once. Typically the table has a power-of-2 number of elements, which is related to size in bits of its input (and output) value. For example, an "8-bit" table has 2⁸ = 256 elements, in which each value from 0 though 255 occurs exactly once.

Even these relatively small tables have remarkable keying potential. Each invertible table differs from every other only in the arrangement of the values it holds, but there is typically an incredible number of possible permutations. A 2-bit table with 2² = 4 elements is one of are 4! (4-factorial) or just 24 different tables. But a 4-bit table with 2⁴ = 16 elements is one of 16! or 2.09 x 10¹³ tables, a 44-bit number, and potentially a 44-bit keyspace. The usual 8-bit tables have a 1648-bit keyspace, per table. When a table is used alone as Simple Substitution, these entries are easily resolved. But as part of a more complex block cipher, the entries may be hidden so that the keying potential of the table can be realized.

Nonlinearity applies to Boolean functions, and so does not apply directly to substitution tables. But each output bit from such a table can be considered a Boolean function. So we can run through the table extracting all the bits in a given bit position, and then measure the nonlinearity of the function represented by those bits.

Clearly, if we measure a nonlinearity value for each output bit position, we do not have a single nonlinearity for the table. Several ways have been suggested to combine these values, including the sum or the average of all values. But for cryptographic use it may be more significant to collect the minimum nonlinearity over all the bit positions. This allows us to argue that no bit position in the table is weaker than the value we have. Since a table collects multiple Boolean functions, tables tend to be weaker than the average Boolean function of the same length. But the nonlinearity values for tables and sequences of the same length do tend to be similar and somewhat comparable.

Some Table Nonlinearity Distributions

There are no nonlinear 2-bit tables. We know this because there are exactly 6 balanced bit sequences of length 4, and each of those has a measured nonlinearity of zero. So there is no chance to build a nonlinear table by collecting those sequences.

Here are some coarse graphs of nonlinearity distributions at various table sizes:

  Nonlinearity Distribution in 4-Bit Tables

   0.6 |
   0.5 |     *  *
   0.4 |     *  *
   0.3 |     *  *
   0.2 |     *  *
   0.1 |     *  *
   0.0 |  *  *  *
  Prob +--+--+--+--
          0  2  4   Nonlinearity

  Nonlinearity Distribution in 5-Bit Tables

   0.7 |              *
   0.6 |              *
   0.5 |              *
   0.4 |              *
   0.3 |              *
   0.2 |           *  *
   0.1 |           *  *  *
   0.0 |           *  *  *
  Prob +--+--+--+--+--+--+--
          0  2  4  6  8 10   Nonlinearity

  Nonlinearity Distribution in 8-Bit Tables

  0.35  |                 *
  0.3   |                 *
  0.25  |              *  *
  0.2   |              *  *  *
  0.15  |              *  *  *
  0.1   |           *  *  *  *
  0.05  |        *  *  *  *  *
  0.00  |     *  *  *  *  *  *  *
  Prob  +--+--+--+--+--+--+--+--+--
             92    96    100   104   Nonlinearity

  Nonlinearity Distribution in 10-Bit Tables

  0.2   |
  0.175 |                    *  *
  0.15  |                    *  *  *
  0.125 |                 *  *  *  *
  0.1   |              *  *  *  *  *
  0.075 |              *  *  *  *  *  *
  0.05  |        *  *  *  *  *  *  *  *
  0.025 |     *  *  *  *  *  *  *  *  *  *
  0.00  |  *  *  *  *  *  *  *  *  *  *  *
  Prob  +--+--+--+--+--+--+--+--+--+--+--+--+--
          436   440   444   448   452   456   Nonlinearity

References and Bibliography

[AY82] Ayoub, F. 1982. Probabilistic completeness of substitution-permutation encryption networks. IEE Proceedings, Part E. 129(5): 195-199.

[DAE94] Daemen, J., R. Govaerts and J. Vandewalle. 1994. Correlation Matrices. Fast Software Encryption. 275-285.

[FOR88] Forre, R. 1988. The Strict Avalanche Criterion: Spectral Properties of Boolean Functions and an Extended Definition. Advances in Cryptology -- CRYPTO '88. 450-468.

[GOR82] Gordon, J. and H. Retkin. 1982. Are Big S-Boxes Best? Cryptography. Proceedings of the Workshop on Cryptography, Burg Feuerstein, Germany, March 29-April 2, 1982. 257-262.

[HED78] Hedayat, A. and W. Wallis. 1978. Hadamard Matrices and their Applications. The Annals of Statistics. 6(6): 1184-1238.

[HEY94] Heys, H. and S. Tavares. 1994. On the security of the CAST encryption algorithm. Canadian Conference on Electrical and Computer Engineering. Halifax, Nova Scotia, Canada, Sept. 1994. 332-335.

[HEY95] Heys, H. and S. Tavares. 1995. Known plaintext cryptanalysis of tree-structured block ciphers. Electronics Letters. 31(10): 784-785.

[MEI89] Meier, W. and O. Staffelbach. 1989. Nonlinearity Criteria for Cryptographic Functions. Advances in Cryptology -- Eurocrypt '89. 549-562.

[MIR97] Mirza, F. 1997. Linear and S-Box Pairs Cryptanalysis of the Data Encryption Standard.

[OC91] O'Connor, L. 1991. Enumerating nondegenerate permutations. Advances in Cryptology -- Eurocrypt '91. 368-377.

[OC93] O'Connor, L. 1993. On the Distribution Characteristics in Bijective Mappings. Advances in Cryptology -- EUROCRYPT '93. 360-370.

[PIE88] Pieprzyk, J. and G. Finkelstein. 1988. Towards effective nonlinear cryptosystem design. IEE Proceedings, Part E. 135(6): 325-335.

[PIE89] Pieprzyk, J. and G. Finkelstein. 1989. Permutations that Maximize Non-Linearity and Their Cryptographic Significance. Computer Security in the Age of Information. 63-74.

[PIE89B] Pieprzyk, J. 1989. Non-linearity of Exponent Permutations. Advances in Cryptology -- EUROCRYPT '89. 80-92.

[PIE93] Pieprzyk, J., C. Charnes and J. Seberry. 1993. Linear Approximation Versus Nonlinearity. Proceedings of the Workshop on Selected Areas in Cryptography (SAC '94). 82-89.

[PRE90] Preneel, B., W. Van Leekwijck, L. Van Linden, R. Govaerts and J. Vandewalle. 1990. Propagation Characteristics of Boolean Functions. Advances in Cryptology -- Eurocrypt '90. 161-173.

[RUE86] Rueppel, R. 1986. Analysis and Design of Stream Ciphers. Springer-Verlag.

[SCH86] Schroeder, M. 1986. Number Theory in Science and Communications. Springer-Verlag.

[SCH87] Schroeder, M. and N. Sloane. 1987. New Permutation Codes Using Hadamard Unscrambling. IEEE Transactions on Information Theory. IT-33(1): 144-145.

[XIO88] Xiao, G-Z. and J. Massey. 1988. A Spectral Characterization of Correlation-Immune Combining Functions. IEEE Transactions on Information Theory. 34(3): 569-571.

[YOU95] Youssef, A. and S. Tavares. 1995. Resistance of Balanced S-boxes to Linear and Differential Cryptanalysis. Information Processing Letters. 56: 249-252.

[YOU95B] Youssef, A. and S. Tavares. 1995. Number of Nonlinear Regular S-boxes. Electronics Letters. 31(19): 1643-1644.

[ZHA95] Zhang, X. and Y. Zheng. 1995. GAC -- the Criterion for Global Avalanche Characteristics of Cryptographic Functions. Journal for Universal Computer Science. 1(5): 316-333.

Other nonlinearity articles, often dealing with measurements on the new block ciphers I have been developing, are available in the Technical Articles section of my pages:

   http://www.ciphersbyritter.com/index.html#TechnicalArticles

Also, many Walsh-Hadamard references are available in my Walsh-Hadamard literature review:

   http://www.ciphersbyritter.com/RES/WALHAD.HTM

Nonlinearity Measurement

Bit Width: 2 3 4 5 6 7 8

Make Table is just a convenient way to create a random permutation and place it in the top panel. The buttons select the size of the table.

Enter Table:

The top panel wants to see a table permutation with a space or a comma between each element. An arbitrary table can be entered, but the number of elements must be some power of 2 (such as: 4, 8, 16, ...).

Bit Column:

Extract LS Bits will run down the list in the top panel and test the least-significant bits of each value to create a bit-sequence in the bottom panel. Extract Next Bits extracts the next most-significant bits.

First Combination creates a balanced bit-sequence of the same length as a table (bit width 4 or less) and puts it in the bottom panel. Next Combination steps the sequence.

The bottom panel normally holds a bit sequence, or the transformed result, with a space or a comma between each value. A general sequence of values can be entered and transformed, but the number of elements must be some power of 2.

Max UD: Nonlinearity:

Transform will run a fast Walsh-Hadamard transform (FWT) on the sequence in the bottom panel, and replace the sequence with the results.

Overall Minimum Nonlinearity: Status:

Overall NL will extract a bit-column and run a FWT for every bit-column of the table in the top panel. The result is the minimum nonlinearity value over all bit columns. Warning: With 8-bit tables this operation has taken almost a minute to complete, and also has crashed Windows 3.1 with a "stack overflow" message.

Terry Ritter, his current address, and his top page.

Last updated:1998-05-21