# Measured Boolean Function Nonlinearity in Mixing Cipher Constructions

## Terry Ritter

Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. Although we cannot measure the nonlinearity of a cipher of practical size, we can measure modest substitution tables and small block constructions. Since a random substitution table is the ideal model of a block cipher, we can measure many random tables and develop an ideal nonlinearity distribution for a small block size. We can then measure small block constructions and compare distributions. Experimental results show that Mixing constructions can produce nonlinearity distributions which are surprisingly close to the ideal. These experiments tend to contradict the claim that block ciphers which use linear mixing are inherently weak.

```From: ritter@io.com (Terry Ritter)
Newsgroups: sci.crypt
Subject: Measured Nonlinearity in Mixing Constructions (LONG!)
Date: Mon, 22 Dec 1997 08:18:14 GMT
Lines: 586
Message-ID: <349e21f4.2151248@news.io.com>

MEASURED NONLINEARITY IN MIXING CONSTRUCTIONS

Terry Ritter
Ritter Software Engineering
http://www.io.com/~ritter/

1997-12-21

Abstract

Nonlinearity is the number of bits which must change in the truth
table of a Boolean function to reach the closest affine function.
Although we cannot measure the nonlinearity of a cipher of practical
size, we can measure modest substitution tables and small block
constructions.  Since a random substitution table is the ideal model
of a block cipher, we can measure many random tables and develop
an ideal nonlinearity distribution for a small block size.  We can
then measure small block constructions and compare distributions.
Experimental results show that Mixing constructions can produce
nonlinearity distributions which are surprisingly close to the ideal.
These experiments tend to contradict the claim that block ciphers
which use linear mixing are inherently weak.

Introduction

The ideal block cipher is a keyed simple substitution table of
sufficient size.  Unfortunately, with 128-bit blocks, there would
be 2**128 entries in that table, which is completely out of the
question.  So the modern block cipher is a *construction* intended
to *simulate* a keyed substitution of the desired size.  At issue
is the effectiveness of the construction technology.  One way to
investigate this is by using Boolean function theory, since a
substitution table can be considered a set of independent Boolean
functions, one for each output bit.

A Boolean function produces a single-bit result for each possible
combination of values from perhaps many Boolean variables.  The
*nonlinearity* of a Boolean function is the Hamming distance to
the closest affine function [e.g., PIE88, PIE89, PIE89B].  That is,
nonlinearity is the number of bits which must change in the truth
table of a Boolean function to reach the closest affine function.
If "linearity" is considered a significant cryptographic weakness,
nonlinearity is an explicit measure of the *lack* of that weakness.
That is, nonlinearity measures one form of cipher "strength."

For cryptographic purposes, it is desired to take the nonlinearity
of a substitution table to be the *minimum* of the nonlinearity
values for each output bit in that table.  Nonlinearity is measured
by forming the (one-bit-wide) truth table for a particular output
bit, then performing a Fast Walsh-Hadamard Transform (FWT) on that
array.  Each result value is essentially a correlation count to a
particular affine function, and the minimum distance is found by
scanning the transform results.

In measuring nonlinearity, it is generally necessary to record the
function result for each possible combination of input variables.
If we have an 8-bit function, we must record and then transform
256 elements, and if we have a 16-bit function, we must record and
transform 64K elements.  Measuring large functions rapidly becomes
impossible.  Although we cannot hope to measure the nonlinearity of
a real 64-bit or 128-bit block cipher, we *can* measure nonlinearity
in substitution tables and small block constructions.  Of course,
this is only reasonable for *scalable* constructions which can build
both real ciphers and measurable tiny versions from the exact same
plans.

Ideal Nonlinearity Distributions

By properly shuffling tables using a large-state cryptographic RNG
(in particular:

http://www.io.com/~ritter/KEYSHUF.HTM

) we can sample a presumably uniform distribution of different table
arrangements.  We can measure the nonlinearity of each of these
tables, and accumulate the results.  For 4-bit tables, we get
something like this:

Nonlinearity Distribution in 4-Bit Tables
Fig. 1
0.6 |
0.5 |     *  *
0.4 |     *  *
0.3 |     *  *
0.2 |     *  *
0.1 |     *  *
0.0 |  *  *  *
Prob +--+--+--+--
0  2  4   Nonlinearity

First, note that nonlinearity always takes even values.  Next,
4-bit invertible substitutions are surprisingly weak:  Despite
having 16 bits of Boolean function description, no 4-bit table
was found to be more than 4 bits away from an affine function,
and almost half were only 2 bits away.  A few of these tables
(probability 0.01) even contain a true affine function.

The 8-bit tables have the opportunity to be far more complex,
since there are 256 bits in each 8-bit Boolean function:

Nonlinearity Distribution in 8-Bit Tables
Fig. 2
0.35  |                 *
0.3   |                 *
0.25  |              *  *
0.2   |              *  *  *
0.15  |              *  *  *
0.1   |           *  *  *  *
0.05  |        *  *  *  *  *
0.00  |     *  *  *  *  *  *  *
Prob  +--+--+--+--+--+--+--+--+--
92    96    100   104   Nonlinearity

Some of the literature asserts that *random* S-Boxes will have
various good qualities provided the tables are "large enough"
[AY82, GOR82, OC91, OC93, YOU95B].  The obvious question is:
"How large is 'large enough'?"  Experimental measurement of the
nonlinearity of 1,000,000 random 8-bit tables shows exactly one
table with a nonlinearity as low as 78; this is a 0.000001
probability (1 count in 1E6).  So not only are random 8-bit
tables with even a single affine coordinate (nonlinearity = 0)
*unlikely*, they are *so* unlikely as to be almost impossible to
find.  (Indeed, [GOR82] gives the probability as 10**-72.)  So
accidental "linearity" in random 8-bit tables seems to be a
non-issue.

Our block cipher construction problem essentially involves using
small random tables (here 4-bit tables with 16 entries each and a
mean nonlinearity of 3.0) to somehow emulate a table which is at
least twice as wide (here an 8-bit table of 256 entries and a mean
nonlinearity of 99.0).  Stated in this way, a solution seems quite
unlikely, for how can just 4, 6, or 8 tables of nonlinearity 3 ever
produce a nonlinearity of 99?  (We also examine using 5-bit tables
with 32 entries and a mean nonlinearity of 7.8 emulating a 10-bit
table with 1024 entries and a mean nonlinearity of 447.7, which
seems similarly improbable.)

Mixing Ciphers

A mixing cipher (see, for example:

http://www.io.com/~ritter/CRYPHTML.HTM#MixTech

especially:

http://www.io.com/~ritter/MIXCORE.HTM

and:

http://www.io.com/~ritter/EXTRMSPD.HTM

) is distinguished by its use of high quality mixing, as opposed
to repeated rounds of lesser mixing.  Mixing ciphers generally use
multiple substitution tables, each of which is shuffled for primary
keying.  The mixing itself is generally linear, but assures that
each and every input affects each and every output in a balanced
way.  Presumably, nonlinearity measurements will have something to
say about possible ill effects of this linear mixing.

With the particular mixing function which I call "Balanced Block
Mixing," (see, for example:

http://www.io.com/~ritter/CRYPHTML.HTM#BBMTech

and especially:

http://www.io.com/~ritter/BBM.HTM

), a change in even one input is *guaranteed* to change every output,
which is the basis for good overall diffusion and for reasoning
about such diffusion.  Mixing ciphers thus satisfy "the important
cryptographic property of completeness" [HEY95].

This mixing is linear, and, by itself, has no strength at all.  But
the purpose of the mixing is to combine two input values into two
output values, *each* of which depends upon *both* input values.
The issue here is whether this sort of structure can produce higher
nonlinearity values than the small substitution tables which are
the only nonlinear components.  Obviously, we would like to see the
result approach the ideal nonlinearity distribution for the
larger block.

Fig. 3 shows the basic mixing construction:  The input block enters
at the top, is split into two and each half substituted.  Those
results are mixed into two new block values, which are then
substituted again.  In this way the larger block is emulated by
operations on half-width blocks.  Each additional mixing means a
new layer of mixing and a new layer of substitution.

A Single Mixing    Fig. 3

|     |
SUB   SUB
\   /
MIX
/   \
SUB   SUB
|     |

With a single mixing (and four random 4-bit tables) we find
nonlinearity values only every 4 steps, instead of every 2:

Nonlinearity Distribution with One Mixing Level
Fig. 4
0.45  |                                *
0.4   |                                *
0.35  |                                *
0.3   |                                *
0.25  |                          *     *
0.2   |                          *     *
0.15  |                          *  *  *  *
0.1   |                          *  *  *  *
0.05  |                          *  *  *  *
0.00  |  *           *  *  *     *  *  *  *
Prob  +--+--+--+--+--+--+--+--+--+--+--+--+--+--
56    64    72    80    88    96    100  Nonlinearity

This is a poor approximation to the ideal 8-bit distribution.

But with two mixings (and six random 4-bit tables) we can do much
better (this represents one run of 10,000 tiny ciphers):

Nonlinearity Distribution with Two Mixing Levels
Fig. 5
0.35  |                >*
0.3   |                 *
0.25  |             >*  *
0.2   |              *  * >*
0.15  |           *  *  *  *
0.1   |          >*  *  *  *
0.05  |       >*  *  *  *  *
0.00  |  * >*  *  *  *  *  * >*
Prob  +->+--+--+--+--+--+--+--+--
92    96    100   104   Nonlinearity

Here the ">" symbols represent the ideal distribution.

The goal here seems to be not necessarily having the exact ideal
distribution, but instead to gain confidence that the construction
does produce reasonable *approximation* to the ideal.  Indeed, one
would expect that a random table must *inevitably* be qualitatively
different from any possible construction which has a smaller total
state.  But Fig. 5 does seem to be a reasonable approximation,
especially in view of Fig. 4.

Alas, chi-square comparisons are more exacting:

Chi-Square For Two 4-Bit Mixings   Fig. 6

NL     Expect     Found       Chi-Sq

106         2         1      }
104       249       242      }  0.255
102      2027      1847        16.984
100      3412      3256         7.132
98      2474      2520         0.855
96      1172      1306        15.321
94       450       530        14.222
92       150       182         6.827
90        46        65         7.848
88        13        29      } 60.500
86         4        11      }
84         1         7      }
82         0         3      }
80         0         1      }

This is obviously an unreasonable chi-square value.  But if we
add just one more mixing, for a total of 3 mixings and eight
random 4-bit tables, we can do much better:

Chi-Square For Three 4-Bit Mixings   Fig. 7

NL     Expect     Found       Chi-Sq

106         2         2      }
104       249       256      }  0.195
102      2027      1985         0.870
100      3412      3386         0.198
98      2474      2518         0.855
96      1172      1179         0.042
94       450       446         0.036
92       150       175         4.167
90        46        35         2.630
88        13        17      }  0.000
86         4         1      }
84         1         0      }
------
8.933

With 9 "degrees of freedom," this is a *reasonable* chi-square
value, making the trial an arguable statistical match to the ideal
distribution.  So if an accurate distribution is in fact needed at
this level, using the very weak 4-bit table components and only two
tables across, three mixing levels would be preferred.

But this brings up two further questions:  First, might two mixings
be enough with 4-bit tables if we have 4 tables across?  And,
might two mixings be enough if we use 5-bit tables?  We investigate
the last question first, because it involves far less computation,
and if the answer is positive, there is no practical need to
investigate the first:  Any serious cipher would use tables wider
than 5-bits anyway.

First, we have the typical nonlinearity of 5-bit tables:

Nonlinearity Distribution in 5-Bit Tables
Fig. 8
0.7 |              *
0.6 |              *
0.5 |              *
0.4 |              *
0.3 |              *
0.2 |           *  *
0.1 |           *  *  *
0.0 |           *  *  *
Prob +--+--+--+--+--+--+--
0  2  4  6  8 10   Nonlinearity

Then the approximate ideal distribution for 10-bit tables:

Nonlinearity Distribution in 10-Bit Tables
Fig. 9
0.2   |
0.175 |                    *  *
0.15  |                    *  *  *
0.125 |                 *  *  *  *
0.1   |              *  *  *  *  *
0.075 |              *  *  *  *  *  *
0.05  |        *  *  *  *  *  *  *  *
0.025 |     *  *  *  *  *  *  *  *  *  *
0.00  |  *  *  *  *  *  *  *  *  *  *  *
Prob  +--+--+--+--+--+--+--+--+--+--+--+--+--
436   440   444   448   452   456   Nonlinearity

While this gives a general idea of the distribution, we now prefer
to use the actual counts, scaled to the expectations of a smaller
trial.  The ideal model is thus constructed from three different
15-hour trials of 1,000,000 random 10-bit tables each, with the
counts reduced by a factor of 100 and rounded to an integer.

The Ideal 10-Bit Nonlinearity Distribution   Fig. 10

NL     Trial 1   Trial 2   Trial 3     Ideal

462           0         1         0         0
460          55        75        55         1
458        2145      2269      2234        22
456       20537     20938     20882       208
454       76259     76344     75980       763
452      148847    148135    148510      1485
450      188329    187802    187585      1878
448      177775    178208    178616      1782
446      140283    140219    140407      1403
444       97415     97317     97465       974
442       61999     62282     62213       622
440       37829     37685     37382       376
438       21675     21789     21753       218
436       12417     12320     12408       124
434        6826      6818      6912        68
432        3740      3693      3584        37
430        1912      2050      1884        19
428         979      1043      1073        10
426         490       537       526         5
424         249       241       265         2
422         119       133       143         1
420          64        55        69         0
418          26        25        30         0
416          12        14        12         0
414           9         5         6         0
412           3         2         4         0
410           0         0         1         0

Next, we have nonlinearity measurements for 1,000 different
10-bit ciphers:

Chi-Square For Two 5-Bit Mixings   Fig. 11

NL     Expect     Found       Chi-Sq

458         2         3      }
456        21        13      }  2.333
454        76        74         0.053
452       149       139         0.667
450       188       202         1.043
448       178       198         2.247
446       140       125         1.607
444        97        95         0.041
442        62        62         0.000
440        38        36         0.105
438        22        31         3.682
436        12        10         0.333
434         7         6      }  0.286
432         4         4      }
430         2         0      }
428         1         2      }
------
12.397

With 11 "degrees of freedom," this is a believable chi-square
value.  So we proceed with a 10,000-cipher trial:

Chi-Square For Two 5-Bit Mixings   Fig. 12

NL     Expect     Found       Chi-Sq

460         1         1      }
458        22        20      }  0.043
456       208       190         1.558
454       763       722         2.203
452      1485      1487         0.003
450      1878      1863         0.120
448      1782      1780         0.002
446      1403      1453         1.782
444       974       961         0.174
442       622       618         0.026
440       376       378         0.011
438       218       243         2.867
436       124       137         1.363
434        68        63         0.368
432        37        45         1.730
430        19        17         0.211
428        10         8      }  0.474
426         5        10      }
424         2         3      }
422         1         0      }
420         1         1      }
------
12.935

With 15 "degrees of freedom," this is also a reasonable value.
So 26 more 10,000-cipher trials were performed, with the following
chi-square results:

Chi-Square Values for 10,000-Cipher Trials    Fig. 13

6.768    7.984    8.054    9.007   11.247   12.225
12.464   12.583   12.901   14.293   15.170   15.876
16.458   16.848   17.842   19.691   19.973   20.375
20.571   21.956   23.108   24.753   24.773   25.709
28.069   33.837

The last value seems unreasonably high; either we got very unlucky,
or the data are telling us that the match is not perfect.  Perhaps
we are seeing imperfections in the 5-bit tables the way we earlier
saw problems in the 4-bit tables.  But most of the values are
very reasonable, and values like this do not repeatedly occur by
chance alone.

Conclusions

These initial experiments support an assertion that a two-level
Mixing cipher using tables of reasonable size can have a nonlinearity
distribution unexpectedly similar to the ideal block cipher.  The
match seems sufficiently precise as to imply the existence of an
underlying mathematical relationship which is currently unknown.

We conclude that Mixing constructions do in fact produce nonlinearity
levels and distributions similar to those of an ideal cipher, despite
using much smaller and much weaker components.  These experiments
thus support the Mixing approach to block cipher design, and do *not*
support the claim that ciphers with linear mixing are inherently weak.

Note that the ability to perform these experiments is based on the
"scalability" of the cipher design.  This is an important advantage
that very few designs offer:  Scalability supports far deeper
experimental analysis than is possible in a full-size cipher.

to have not been reported previously in the open literature.

References and Bibliography

[AY82]  Ayoub, F.  1982.  Probabilistic completeness of
substitution-permutation encryption networks.  IEE Proceedings,
Part E.  129(5): 195-199.

[DAE94]  Daemen, J., R. Govaerts and J. Vandewalle.  1994.
Correlation Matrices.  Fast Software Encryption.  275-285.

[FOR88]  Forre, R.  1988.  The Strict Avalanche Criterion:
Spectral Properties of Boolean Functions and an Extended
Definition.  Advances in Cryptology -- CRYPTO '88.  450-468.

[GOR82]  Gordon, J. and H. Retkin.  1982.  Are Big S-Boxes Best?
Cryptography.  Proceedings of the Workshop on Cryptography,
Burg Feuerstein, Germany, March 29-April 2, 1982.  257-262.

[HEY94]  Heys, H. and S. Tavares.  1994.  On the security of the
CAST encryption algorithm.  Canadian Conference on Electrical and
Computer Engineering.  Halifax, Nova Scotia, Canada, Sept. 1994.
332-335.

[HEY95]  Heys, H. and S. Tavares.  1995.  Known plaintext
cryptanalysis of tree-structured block ciphers.  Electronics
Letters.  31(10): 784-785.

[MEI89]  Meier, W. and O. Staffelbach.  1989.  Nonlinearity Criteria
for Cryptographic Functions.  Advances in Cryptology --
Eurocrypt '89.  549-562.

[MIR97]  Mirza, F.  1997.  Linear and S-Box Pairs Cryptanalysis
of the Data Encryption Standard.

[OC91]  O'Connor, L.  1991.  Enumerating nondegenerate permutations.
Advances in Cryptology -- Eurocrypt '91.  368-377.

[OC93]  O'Connor, L.  1993.  On the Distribution Characteristics in
Bijective Mappings.  Advances in Cryptology -- EUROCRYPT '93.
360-370.

[PIE88]  Pieprzyk, J. and G. Finkelstein. 1988.  Towards effective
nonlinear cryptosystem design.  IEE Proceedings, Part E.
135(6): 325-335.

[PIE89]  Pieprzyk, J. and G. Finkelstein.  1989.  Permutations
that Maximize Non-Linearity and Their Cryptographic Significance.
Computer Security in the Age of Information.  63-74.

[PIE89B]  Pieprzyk, J.  1989.  Non-linearity of Exponent Permutations.
Advances in Cryptology -- EUROCRYPT '89. 80-92.

[PIE93]  Pieprzyk, J., C. Charnes and J. Seberry.  1993.  Linear
Approximation Versus Nonlinearity.  Proceedings of the Workshop on
Selected Areas in Cryptography (SAC '94).  82-89.

[PRE90]  Preneel, B., W. Van Leekwijck, L. Van Linden, R. Govaerts
and J. Vandewalle.  1990.  Propagation Characteristics of Boolean
Functions.  Advances in Cryptology -- Eurocrypt '90.  161-173.

[RUE86]  Rueppel, R.  1986.  Analysis and Design of Stream Ciphers.
Springer-Verlag.

[XIO88]  Xiao, G-Z. and J. Massey.  1988.  A Spectral Characterization
of Correlation-Immune Combining Functions.  IEEE Transactions on
Information Theory.  34(3): 569-571.

[YOU95]  Youssef, A. and S. Tavares.  1995.  Resistance of Balanced
S-boxes to Linear and Differential Cryptanalysis.  Information
Processing Letters.  56: 249-252.

[YOU95B]  Youssef, A. and S. Tavares.  1995.  Number of Nonlinear
Regular S-boxes.  Electronics Letters.  31(19): 1643-1644.

[ZHA95]  Zhang, X. and Y. Zheng.  1995.  GAC -- the Criterion for
Global Avalanche Characteristics of Cryptographic Functions.
Journal for Universal Computer Science.  1(5): 316-333.

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
The Crypto Glossary:  http://www.io.com/~ritter/GLOSSARY.HTM

```

Also see: Measuring Nonlinearity by Walsh Transform (1998) (20K)

Terry Ritter, his current address, and his top page.

Last updated: 1997-12-28