Measuring Boolean Function Nonlinearity by Walsh Transform

Terry Ritter

Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. A basic background in affine Boolean functions and their relationship to Walsh-Hadamard functions is presented. It is shown how distances to linear functions can be computed by hand, and that these values are exactly the same as those produced by a Fast Walsh Transform (FWT), which also can be computed by hand. A Pascal routine for the FWT is presented as a basis for practical nonlinearity measurement.

```From: ritter@io.com (Terry Ritter)
Newsgroups: sci.crypt
Subject: Measuring Nonlinearity By Walsh Transform (Long!)
Date: Tue, 13 Jan 1998 05:10:08 GMT
Lines: 522
Message-ID: <34baf5ee.192497@news.io.com>

MEASURING NONLINEARITY BY WALSH TRANSFORM

Terry Ritter   ritter@io.com
Ritter Software Engineering
http://www.io.com/~ritter/

1998-01-12

Abstract

Nonlinearity is the number of bits which must change in the truth
table of a Boolean function to reach the closest affine function.
A basic background in affine Boolean functions and their relationship
to Walsh-Hadamard functions is presented.  It is shown how distances
to linear functions can be computed by hand, and that these values
are exactly the same as those produced by a Fast Walsh Transform
(FWT), which also can be computed by hand.  A Pascal routine for the
FWT is presented as a basis for practical nonlinearity measurement.

Affine Boolean Functions

A Boolean function produces a single-bit result for each possible
combination of values from perhaps many Boolean variables.  The
Boolean field consists of the values {0,1}, with XOR as "addition"
and AND as "multiplication."

We call a function "constant" when it has only one fixed value,
which we often denote "k":

(1)  f = k

I call a function "linear" when it fits the plane geometry
equation of a line (here the constant is "b"):

(2)  y = mx + b

Note that in the Boolean field, a constant value of '1' acts to
reverse the sense (invert) the output value, while a constant of
'0' does not affect the result at all.

I call a function "affine" when it fits the form:

(3)  f = a   * x    + a   * x    + ... + a * x  + a
n-1   n-1    n-2   n-2          1   1    0

(Others reserve "affine" for Equation 3 *without* a constant, and
use the term "linear" for Equation 3 *with* a constant.)

In Boolean functions, each of the coefficients a[i] acts only to
*select* each different variable x[i], and if we consider all a[i]
a number, we have a unique ordering for affine Boolean functions
as shown in Figure 1.

Affine Functions   Fig. 1

f0 = 0
f1 = 1
f2 = x[1]
f3 = x[1] + 1
f4 = x[2]
f5 = x[2] + 1
f6 = x[2] + x[1]
f7 = x[2] + x[1] + 1
. . .

Thus, we can write 16 different forms for 3 variables.  But it is
convenient to pair the functions which are the same except for the
value of the constant, and then we have exactly 8 affine Boolean
functions of 3 variables (with and without constant).  Of course,
any possible Boolean function of 3 variables *also* can be expressed
in 8 terms, one for every possible combination of variable value.
This is the "truth table" for a function, or the function expressed
as the sequence of its output values.

We can now express every possible 3-variable affine Boolean function,
in order, as shown in Figure 2.

The 3-Variable Affine Boolean Functions   Fig. 2

affine    truth table

1    1  1  1  1  1  1  1  1
x0    0  1  0  1  0  1  0  1
x1       0  0  1  1  0  0  1  1
x1+x0    0  1  1  0  0  1  1  0
x2          0  0  0  0  1  1  1  1
x2+   x0    0  1  0  1  1  0  1  0
x2+x1       0  0  1  1  1  1  0  0
x2+x1+x0    0  1  1  0  1  0  0  1

Correlation

One way to measure a sort of "correlation" between two Boolean
functions is to compare their truth tables and count the number
of bits which differ; this is their Hamming distance.  Then, since
we expect about half the bit positions to differ (on average),
we can subtract the expected difference: this gives what I am
calling -- for lack of a better term -- "unexpected distance" (UD).
The magnitude of the UD relates to how unexpected the distance is,
their difference as shown in Figure 3:

Distance to an Affine Function   Fig. 3

f    1  0  0  1  1  1  0  0
x2+x1+x0    0  1  1  0  1  0  0  1
diff    1  1  1  1  0  1  0  1

We have a Hamming distance of 6 between these two functions.  This
is an unexpected distance of 6 - 4 = +2, which means that 2 more
bits differ than we would expect.

With some work, we can now compare a Boolean function to each
possible affine Boolean function, and develop a distance to each.
The less distance to an affine function, the more "linear" the
measured function must be (see Figure 4).

Distance to Each 3-Variable Affine Boolean Function   Fig. 4

affine    truth table                 distance

1    1  1  1  1  1  1  1  1      4
x0    0  1  0  1  0  1  0  1      4
x1       0  0  1  1  0  0  1  1      6
x1+x0    0  1  1  0  0  1  1  0      6
x2          0  0  0  0  1  1  1  1      4
x2+   x0    0  1  0  1  1  0  1  0      4
x2+x1       0  0  1  1  1  1  0  0      2
x2+x1+x0    0  1  1  0  1  0  0  1      6

f    1  0  0  1  1  1  0  0

So the minimum distance from f to an affine function is 2, making
the closest direct function x2+x1.  *But*, if we complement those
functions which have a distance of 6 (that is, add a constant '1'
term in the affine equation) we get the values shown in Figure 5.

Distance to Complement Affine Functions   Fig. 5

complement table            distance

x1       1  1  0  0  1  1  0  0      2
x1+x0    1  0  0  1  1  0  0  1      2
x2+x1+x0    1  0  0  1  0  1  1  0      2

f    1  0  0  1  1  1  0  0

This *could* mean that we need to consider the distance from f to
the closest affine *complement* as well as the non-complement.  But
this is unnecessary if we use the absolute value of the unexpected
distance, as shown in Figure 6.

Unexpected Distance to Affine Boolean Function   Fig. 6

affine    truth table                 distance    ud

1    1  1  1  1  1  1  1  1      4           0
x0    0  1  0  1  0  1  0  1      4           0
x1       0  0  1  1  0  0  1  1      6          +2
x1+x0    0  1  1  0  0  1  1  0      6          +2
x2          0  0  0  0  1  1  1  1      4           0
x2+   x0    0  1  0  1  1  0  1  0      4           0
x2+x1       0  0  1  1  1  1  0  0      2          -2
x2+x1+x0    0  1  1  0  1  0  0  1      6          +2

f    1  0  0  1  1  1  0  0

When we have a balanced function f, the distances to affine
functions are always *even*.  This is because the affine functions
themselves are balanced, and if we change one bit in a balanced
function, we must change at least one other bit to keep it balanced.
If the first change increased the distance by 1, the next bit change
can either increase it again by 1, or return to the original value.
Balanced functions always will be at even distances from each other.
This issue will arise again in the nonlinearity of the balanced
Boolean functions which occur in invertible substitutions.

A Hadamard matrix H is an n x n matrix with all entries +1 or -1,
such that all rows are orthogonal and all columns are orthogonal
(see, for example, [HED78]).

The usual development (see, for example [SCH87]) starts with a
defined 2 x 2 Hadamard matrix H2 which is ((1,1),(1,-1)).  Each step
consists of multiplying each element in H2 by the previous matrix,
thus negating all elements in the bottom-right entry, as shown in
Figure 7.

H2 = |  1  1 |    H4 = H2 * H2 = | H2  H2 |
|  1 -1 |                   | H2 -H2 |

H4 = | | 1  1 |  | 1  1 | | = | 1  1  1  1 |
| | 1 -1 |  | 1 -1 | |   | 1 -1  1 -1 |
|                    |   | 1  1 -1 -1 |
| | 1  1 |  |-1 -1 | |   | 1 -1 -1  1 |
| | 1 -1 |  |-1  1 | |

H8 = | H4  H4 | = | 1  1  1  1  1  1  1  1 |
| H4 -H4 |   | 1 -1  1 -1  1 -1  1 -1 |
| 1  1 -1 -1  1  1 -1 -1 |
| 1 -1 -1  1  1 -1 -1  1 |
| 1  1  1  1 -1 -1 -1 -1 |
| 1 -1  1 -1 -1  1 -1  1 |
| 1  1 -1 -1 -1 -1  1  1 |
| 1 -1 -1  1 -1  1  1 -1 |

Now compare H8 from this strange Hadamard development to the affine
functions from Figure 2, shown again as Figure 8:

The 3-Variable Affine Boolean Functions   Fig. 8

c    0  0  0  0  0  0  0  0
x0    0  1  0  1  0  1  0  1
x1       0  0  1  1  0  0  1  1
x1+x0    0  1  1  0  0  1  1  0
x2          0  0  0  0  1  1  1  1
x2+   x0    0  1  0  1  1  0  1  0
x2+x1       0  0  1  1  1  1  0  0
x2+x1+x0    0  1  1  0  1  0  0  1

So if we map the values in the affine truth table: {0,1} -> {1,-1},
we find *the* *same* *functions* as in the Hadamard development.
These are the Walsh functions, and here both developments produce
the same order, which is called "natural" or "Hadamard."  (There are
two other canonical orderings for Walsh functions, which we need not
get into here.)  Walsh functions have fast transforms which reduce
the cost of correlation computations from n*n to n log n, a very
substantial reduction.

A Fast Walsh Transform

It is easy to do a Fast Walsh Transform by hand.  (Well, I say
"easy," then always struggle when I actually do it.)  Let's do the
FWT of function f: (1 0 0 1 1 1 0 0):  First note that f has a binary
power length, as required.  Next, each pair of elements is modified
by an "in-place butterfly"; that is, the values in each pair produce
two results which replace the original pair, wherever they were
originally located.  The left result will be the two values added;
the right will be the first less the second.  That is,

(4)  (a',b') = (a+b, a-b)

So for the values (1,0), we get (1+0, 1-0) which is just (1,1).
We start out pairing adjacent elements, then every other element,
then every 4th element, and so on until the correct pairing is
impossible, as shown in Figure 8.

An 8-Element Fast Walsh Transform (FWT)   Fig. 8

original      1   0   0   1   1   1   0   0
^---^   ^---^   ^---^   ^---^

first      1   1   1  -1   2   0   0   0
^-------^       ^-------^
^-------^       ^-------^

second      2   0   0   2   2   0   2   0
^---------------^
^---------------^
^---------------^
^---------------^

final      4   0   2   2   0   0  -2   2

Now compare these results to the UD values we found in Fig. 4:

Unexpected Distance to the Affine Functions   Fig. 9

affine     ud

1     0
x0     0
x1       +2
x1+x0    +2
x2           0
x2+   x0     0
x2+x1       -2
x2+x1+x0    +2

Note that all FWT elements -- after the zeroth -- map the U.D.
results *exactly* in both magnitude and sign, *and* in the exact
same order.  (This ordering means that the binary index of any
result is also the recipe for expressing the affine function being
compared in that position, as shown in Figure 2.)  The zeroth
element in the FWT (here 4) is a special case and is the number
of 1-bits in the function when we use the real values {0,1} to
represent the function.  When measuring balanced functions of a
particular size, the zeroth entry should be obvious.

Thus, to find the distance from any balanced function to the closest
affine function, compute the FWT of the given function expressed
as the real values {0,1}, then find the *maximum* absolute value
of all unexpected distance elements past the zeroth element.

A Fast Walsh Transform Routine

The Fast Walsh Transform by hand is automated in the Borland Pascal
listing of Figure 10.

A Fast Walsh Transform (FWT) in Pascal   Fig. 10

TYPE
Lwd = LongInt;
LintArray = ARRAY[ 0..16380 ] of LONGINT;

PROCEDURE LintHadFmSeqWalsh( VAR DatLintAr; lastel: Lwd );
{ Hadamard Walsh from sequential data, in-place }
VAR
Dat: LintArray ABSOLUTE DatLintAr;
a, b: LONGINT;
stradwid,  { distance between pair of elements }
blockstart, block, pair, el1, el2: Lwd;
BEGIN
blockstart := lastel;
REPEAT
el1 := 0;
blockstart := blockstart DIV 2;
FOR block := blockstart DOWNTO 0 DO
BEGIN
FOR pair := 0 TO PRED(stradwid) DO
BEGIN
a := Dat[ el1 ];
b := Dat[ el2 ];
Dat[ el1 ] := a + b;
Dat[ el2 ] := a - b;
Inc( el1 );  Inc( el2 );
END;
el1 := el2;
END;

LintHadFmSeqWalsh takes an array of 32-bit integers, and changes the
array data into the Walsh-Hadamard transform of that data.  For
nonlinearity measures, the input data are {0,1} or {1,-1}; the
results are potentially bipolar in either case.  (The "lastel"
parameter is the last index in the data array which starts at
index 0; it is thus always 2**n - 1 for some n.  The ABSOLUTE
attribute forces Borland Pascal to treat the parameter as a LongInt
array of arbitrary size.)

Nonlinearity

Nonlinearity is the number of bits which must change in the truth
table of a Boolean function to reach the closest affine function.

With the FWT we compute the "unexpected distance," or distance away
from the expected difference.  By finding the largest absolute value,
we can find the closest "linear" function.  This is, in a sense,
a *linearity* measure.  But we want the *non* linearity, and this
is half the number of bits in the function, less the absolute value
of the unexpected distance.  So we can scan the FWT results either for
the maximum absolute value first, or calculate the nonlinearity for
each, and take the minimum.  Nonlinearity is always positive, and
also even if we have a balanced function.

It is common to consider a Boolean function as consisting of the
real values {0,1}, but it is *also* common to use the transformation

x
(5)  x' = -1

where x is {0,1}.  This transforms {0,1} -> {1,-1}, although we can
do the exact same thing with:

(6)  x' = 1 - 2x

This transformation has some implications:  Using real values
{1,-1} doubles the magnitude and changes the sign of the FWT
results, but can simplify nonlinearity for unbalanced functions,
because the zeroth term need not be treated specially.  But if
the Boolean function is balanced, as it will be in the typical
invertible substitution table, the zeroth element need not be used
at all, so using real values {1,-1} seems to provide no particular
benefit in this application.

Summary

The basic concepts of affine Boolean functions have been presented,
and a correspondence to Walsh-Hadamard functions illustrated.  A
Fast Walsh Transform was introduced and explained.  Hopefully this
will help those seeking to measure nonlinearity, and so help make
this measurement better understood and more widely used.

References and Bibliography

[AY82]  Ayoub, F.  1982.  Probabilistic completeness of
substitution-permutation encryption networks.  IEE Proceedings,
Part E.  129(5): 195-199.

[DAE94]  Daemen, J., R. Govaerts and J. Vandewalle.  1994.
Correlation Matrices.  Fast Software Encryption.  275-285.

[FOR88]  Forre, R.  1988.  The Strict Avalanche Criterion:
Spectral Properties of Boolean Functions and an Extended
Definition.  Advances in Cryptology -- CRYPTO '88.  450-468.

[GOR82]  Gordon, J. and H. Retkin.  1982.  Are Big S-Boxes Best?
Cryptography.  Proceedings of the Workshop on Cryptography,
Burg Feuerstein, Germany, March 29-April 2, 1982.  257-262.

[HED78]  Hedayat, A. and W. Wallis.  1978.  Hadamard Matrices and
their Applications.  The Annals of Statistics.  6(6): 1184-1238.

[HEY94]  Heys, H. and S. Tavares.  1994.  On the security of the
CAST encryption algorithm.  Canadian Conference on Electrical and
Computer Engineering.  Halifax, Nova Scotia, Canada, Sept. 1994.
332-335.

[HEY95]  Heys, H. and S. Tavares.  1995.  Known plaintext
cryptanalysis of tree-structured block ciphers.  Electronics
Letters.  31(10): 784-785.

[MEI89]  Meier, W. and O. Staffelbach.  1989.  Nonlinearity Criteria
for Cryptographic Functions.  Advances in Cryptology --
Eurocrypt '89.  549-562.

[MIR97]  Mirza, F.  1997.  Linear and S-Box Pairs Cryptanalysis
of the Data Encryption Standard.

[OC91]  O'Connor, L.  1991.  Enumerating nondegenerate permutations.
Advances in Cryptology -- Eurocrypt '91.  368-377.

[OC93]  O'Connor, L.  1993.  On the Distribution Characteristics in
Bijective Mappings.  Advances in Cryptology -- EUROCRYPT '93.
360-370.

[PIE88]  Pieprzyk, J. and G. Finkelstein. 1988.  Towards effective
nonlinear cryptosystem design.  IEE Proceedings, Part E.
135(6): 325-335.

[PIE89]  Pieprzyk, J. and G. Finkelstein.  1989.  Permutations
that Maximize Non-Linearity and Their Cryptographic Significance.
Computer Security in the Age of Information.  63-74.

[PIE89B]  Pieprzyk, J.  1989.  Non-linearity of Exponent Permutations.
Advances in Cryptology -- EUROCRYPT '89. 80-92.

[PIE93]  Pieprzyk, J., C. Charnes and J. Seberry.  1993.  Linear
Approximation Versus Nonlinearity.  Proceedings of the Workshop on
Selected Areas in Cryptography (SAC '94).  82-89.

[PRE90]  Preneel, B., W. Van Leekwijck, L. Van Linden, R. Govaerts
and J. Vandewalle.  1990.  Propagation Characteristics of Boolean
Functions.  Advances in Cryptology -- Eurocrypt '90.  161-173.

[RUE86]  Rueppel, R.  1986.  Analysis and Design of Stream Ciphers.
Springer-Verlag.

[SCH86]  Schroeder, M.  1986.  Number Theory in Science and
Communications.  Springer-Verlag.

[SCH87]  Schroeder, M. and N. Sloane.  1987.  New Permutation Codes
Using Hadamard Unscrambling.  IEEE Transactions on Information
Theory.  IT-33(1): 144-145.

[XIO88]  Xiao, G-Z. and J. Massey.  1988.  A Spectral Characterization
of Correlation-Immune Combining Functions.  IEEE Transactions on
Information Theory.  34(3): 569-571.

[YOU95]  Youssef, A. and S. Tavares.  1995.  Resistance of Balanced
S-boxes to Linear and Differential Cryptanalysis.  Information
Processing Letters.  56: 249-252.

[YOU95B]  Youssef, A. and S. Tavares.  1995.  Number of Nonlinear
Regular S-boxes.  Electronics Letters.  31(19): 1643-1644.

[ZHA95]  Zhang, X. and Y. Zheng.  1995.  GAC -- the Criterion for
Global Avalanche Characteristics of Cryptographic Functions.
Journal for Universal Computer Science.  1(5): 316-333.

literature review: