Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. Although we cannot measure the nonlinearity of a cipher of practical size, we can measure modest substitution tables and small block constructions. Since a random substitution table is the ideal model of a block cipher, we can measure many random tables and develop an ideal nonlinearity distribution for a small block size. We can then measure small block constructions and compare distributions. Experimental results show that Mixing constructions can produce nonlinearity distributions which are surprisingly close to the ideal. These experiments tend to contradict the claim that block ciphers which use linear mixing are inherently weak.
From: ritter@io.com (Terry Ritter)
Newsgroups: sci.crypt
Subject: Measured Nonlinearity in Mixing Constructions (LONG!)
Date: Mon, 22 Dec 1997 08:18:14 GMT
Lines: 586
Message-ID: <349e21f4.2151248@news.io.com>
MEASURED NONLINEARITY IN MIXING CONSTRUCTIONS
Terry Ritter
Ritter Software Engineering
http://www.io.com/~ritter/
1997-12-21
Abstract
Nonlinearity is the number of bits which must change in the truth
table of a Boolean function to reach the closest affine function.
Although we cannot measure the nonlinearity of a cipher of practical
size, we can measure modest substitution tables and small block
constructions. Since a random substitution table is the ideal model
of a block cipher, we can measure many random tables and develop
an ideal nonlinearity distribution for a small block size. We can
then measure small block constructions and compare distributions.
Experimental results show that Mixing constructions can produce
nonlinearity distributions which are surprisingly close to the ideal.
These experiments tend to contradict the claim that block ciphers
which use linear mixing are inherently weak.
Introduction
The ideal block cipher is a keyed simple substitution table of
sufficient size. Unfortunately, with 128-bit blocks, there would
be 2**128 entries in that table, which is completely out of the
question. So the modern block cipher is a *construction* intended
to *simulate* a keyed substitution of the desired size. At issue
is the effectiveness of the construction technology. One way to
investigate this is by using Boolean function theory, since a
substitution table can be considered a set of independent Boolean
functions, one for each output bit.
A Boolean function produces a single-bit result for each possible
combination of values from perhaps many Boolean variables. The
*nonlinearity* of a Boolean function is the Hamming distance to
the closest affine function [e.g., PIE88, PIE89, PIE89B]. That is,
nonlinearity is the number of bits which must change in the truth
table of a Boolean function to reach the closest affine function.
If "linearity" is considered a significant cryptographic weakness,
nonlinearity is an explicit measure of the *lack* of that weakness.
That is, nonlinearity measures one form of cipher "strength."
For cryptographic purposes, it is desired to take the nonlinearity
of a substitution table to be the *minimum* of the nonlinearity
values for each output bit in that table. Nonlinearity is measured
by forming the (one-bit-wide) truth table for a particular output
bit, then performing a Fast Walsh-Hadamard Transform (FWT) on that
array. Each result value is essentially a correlation count to a
particular affine function, and the minimum distance is found by
scanning the transform results.
In measuring nonlinearity, it is generally necessary to record the
function result for each possible combination of input variables.
If we have an 8-bit function, we must record and then transform
256 elements, and if we have a 16-bit function, we must record and
transform 64K elements. Measuring large functions rapidly becomes
impossible. Although we cannot hope to measure the nonlinearity of
a real 64-bit or 128-bit block cipher, we *can* measure nonlinearity
in substitution tables and small block constructions. Of course,
this is only reasonable for *scalable* constructions which can build
both real ciphers and measurable tiny versions from the exact same
plans.
Ideal Nonlinearity Distributions
By properly shuffling tables using a large-state cryptographic RNG
(in particular:
http://www.io.com/~ritter/KEYSHUF.HTM
) we can sample a presumably uniform distribution of different table
arrangements. We can measure the nonlinearity of each of these
tables, and accumulate the results. For 4-bit tables, we get
something like this:
Nonlinearity Distribution in 4-Bit Tables
Fig. 1
0.6 |
0.5 | * *
0.4 | * *
0.3 | * *
0.2 | * *
0.1 | * *
0.0 | * * *
Prob +--+--+--+--
0 2 4 Nonlinearity
First, note that nonlinearity always takes even values. Next,
4-bit invertible substitutions are surprisingly weak: Despite
having 16 bits of Boolean function description, no 4-bit table
was found to be more than 4 bits away from an affine function,
and almost half were only 2 bits away. A few of these tables
(probability 0.01) even contain a true affine function.
The 8-bit tables have the opportunity to be far more complex,
since there are 256 bits in each 8-bit Boolean function:
Nonlinearity Distribution in 8-Bit Tables
Fig. 2
0.35 | *
0.3 | *
0.25 | * *
0.2 | * * *
0.15 | * * *
0.1 | * * * *
0.05 | * * * * *
0.00 | * * * * * * *
Prob +--+--+--+--+--+--+--+--+--
92 96 100 104 Nonlinearity
Some of the literature asserts that *random* S-Boxes will have
various good qualities provided the tables are "large enough"
[AY82, GOR82, OC91, OC93, YOU95B]. The obvious question is:
"How large is 'large enough'?" Experimental measurement of the
nonlinearity of 1,000,000 random 8-bit tables shows exactly one
table with a nonlinearity as low as 78; this is a 0.000001
probability (1 count in 1E6). So not only are random 8-bit
tables with even a single affine coordinate (nonlinearity = 0)
*unlikely*, they are *so* unlikely as to be almost impossible to
find. (Indeed, [GOR82] gives the probability as 10**-72.) So
accidental "linearity" in random 8-bit tables seems to be a
non-issue.
Our block cipher construction problem essentially involves using
small random tables (here 4-bit tables with 16 entries each and a
mean nonlinearity of 3.0) to somehow emulate a table which is at
least twice as wide (here an 8-bit table of 256 entries and a mean
nonlinearity of 99.0). Stated in this way, a solution seems quite
unlikely, for how can just 4, 6, or 8 tables of nonlinearity 3 ever
produce a nonlinearity of 99? (We also examine using 5-bit tables
with 32 entries and a mean nonlinearity of 7.8 emulating a 10-bit
table with 1024 entries and a mean nonlinearity of 447.7, which
seems similarly improbable.)
Mixing Ciphers
A mixing cipher (see, for example:
http://www.io.com/~ritter/CRYPHTML.HTM#MixTech
especially:
http://www.io.com/~ritter/MIXCORE.HTM
and:
http://www.io.com/~ritter/EXTRMSPD.HTM
) is distinguished by its use of high quality mixing, as opposed
to repeated rounds of lesser mixing. Mixing ciphers generally use
multiple substitution tables, each of which is shuffled for primary
keying. The mixing itself is generally linear, but assures that
each and every input affects each and every output in a balanced
way. Presumably, nonlinearity measurements will have something to
say about possible ill effects of this linear mixing.
With the particular mixing function which I call "Balanced Block
Mixing," (see, for example:
http://www.io.com/~ritter/CRYPHTML.HTM#BBMTech
and especially:
http://www.io.com/~ritter/BBM.HTM
), a change in even one input is *guaranteed* to change every output,
which is the basis for good overall diffusion and for reasoning
about such diffusion. Mixing ciphers thus satisfy "the important
cryptographic property of completeness" [HEY95].
This mixing is linear, and, by itself, has no strength at all. But
the purpose of the mixing is to combine two input values into two
output values, *each* of which depends upon *both* input values.
The issue here is whether this sort of structure can produce higher
nonlinearity values than the small substitution tables which are
the only nonlinear components. Obviously, we would like to see the
result approach the ideal nonlinearity distribution for the
larger block.
Fig. 3 shows the basic mixing construction: The input block enters
at the top, is split into two and each half substituted. Those
results are mixed into two new block values, which are then
substituted again. In this way the larger block is emulated by
operations on half-width blocks. Each additional mixing means a
new layer of mixing and a new layer of substitution.
A Single Mixing Fig. 3
| |
SUB SUB
\ /
MIX
/ \
SUB SUB
| |
With a single mixing (and four random 4-bit tables) we find
nonlinearity values only every 4 steps, instead of every 2:
Nonlinearity Distribution with One Mixing Level
Fig. 4
0.45 | *
0.4 | *
0.35 | *
0.3 | *
0.25 | * *
0.2 | * *
0.15 | * * * *
0.1 | * * * *
0.05 | * * * *
0.00 | * * * * * * * *
Prob +--+--+--+--+--+--+--+--+--+--+--+--+--+--
56 64 72 80 88 96 100 Nonlinearity
This is a poor approximation to the ideal 8-bit distribution.
But with two mixings (and six random 4-bit tables) we can do much
better (this represents one run of 10,000 tiny ciphers):
Nonlinearity Distribution with Two Mixing Levels
Fig. 5
0.35 | >*
0.3 | *
0.25 | >* *
0.2 | * * >*
0.15 | * * * *
0.1 | >* * * *
0.05 | >* * * * *
0.00 | * >* * * * * * >*
Prob +->+--+--+--+--+--+--+--+--
92 96 100 104 Nonlinearity
Here the ">" symbols represent the ideal distribution.
The goal here seems to be not necessarily having the exact ideal
distribution, but instead to gain confidence that the construction
does produce reasonable *approximation* to the ideal. Indeed, one
would expect that a random table must *inevitably* be qualitatively
different from any possible construction which has a smaller total
state. But Fig. 5 does seem to be a reasonable approximation,
especially in view of Fig. 4.
Alas, chi-square comparisons are more exacting:
Chi-Square For Two 4-Bit Mixings Fig. 6
NL Expect Found Chi-Sq
106 2 1 }
104 249 242 } 0.255
102 2027 1847 16.984
100 3412 3256 7.132
98 2474 2520 0.855
96 1172 1306 15.321
94 450 530 14.222
92 150 182 6.827
90 46 65 7.848
88 13 29 } 60.500
86 4 11 }
84 1 7 }
82 0 3 }
80 0 1 }
This is obviously an unreasonable chi-square value. But if we
add just one more mixing, for a total of 3 mixings and eight
random 4-bit tables, we can do much better:
Chi-Square For Three 4-Bit Mixings Fig. 7
NL Expect Found Chi-Sq
106 2 2 }
104 249 256 } 0.195
102 2027 1985 0.870
100 3412 3386 0.198
98 2474 2518 0.855
96 1172 1179 0.042
94 450 446 0.036
92 150 175 4.167
90 46 35 2.630
88 13 17 } 0.000
86 4 1 }
84 1 0 }
------
8.933
With 9 "degrees of freedom," this is a *reasonable* chi-square
value, making the trial an arguable statistical match to the ideal
distribution. So if an accurate distribution is in fact needed at
this level, using the very weak 4-bit table components and only two
tables across, three mixing levels would be preferred.
But this brings up two further questions: First, might two mixings
be enough with 4-bit tables if we have 4 tables across? And,
might two mixings be enough if we use 5-bit tables? We investigate
the last question first, because it involves far less computation,
and if the answer is positive, there is no practical need to
investigate the first: Any serious cipher would use tables wider
than 5-bits anyway.
First, we have the typical nonlinearity of 5-bit tables:
Nonlinearity Distribution in 5-Bit Tables
Fig. 8
0.7 | *
0.6 | *
0.5 | *
0.4 | *
0.3 | *
0.2 | * *
0.1 | * * *
0.0 | * * *
Prob +--+--+--+--+--+--+--
0 2 4 6 8 10 Nonlinearity
Then the approximate ideal distribution for 10-bit tables:
Nonlinearity Distribution in 10-Bit Tables
Fig. 9
0.2 |
0.175 | * *
0.15 | * * *
0.125 | * * * *
0.1 | * * * * *
0.075 | * * * * * *
0.05 | * * * * * * * *
0.025 | * * * * * * * * * *
0.00 | * * * * * * * * * * *
Prob +--+--+--+--+--+--+--+--+--+--+--+--+--
436 440 444 448 452 456 Nonlinearity
While this gives a general idea of the distribution, we now prefer
to use the actual counts, scaled to the expectations of a smaller
trial. The ideal model is thus constructed from three different
15-hour trials of 1,000,000 random 10-bit tables each, with the
counts reduced by a factor of 100 and rounded to an integer.
The Ideal 10-Bit Nonlinearity Distribution Fig. 10
NL Trial 1 Trial 2 Trial 3 Ideal
462 0 1 0 0
460 55 75 55 1
458 2145 2269 2234 22
456 20537 20938 20882 208
454 76259 76344 75980 763
452 148847 148135 148510 1485
450 188329 187802 187585 1878
448 177775 178208 178616 1782
446 140283 140219 140407 1403
444 97415 97317 97465 974
442 61999 62282 62213 622
440 37829 37685 37382 376
438 21675 21789 21753 218
436 12417 12320 12408 124
434 6826 6818 6912 68
432 3740 3693 3584 37
430 1912 2050 1884 19
428 979 1043 1073 10
426 490 537 526 5
424 249 241 265 2
422 119 133 143 1
420 64 55 69 0
418 26 25 30 0
416 12 14 12 0
414 9 5 6 0
412 3 2 4 0
410 0 0 1 0
Next, we have nonlinearity measurements for 1,000 different
10-bit ciphers:
Chi-Square For Two 5-Bit Mixings Fig. 11
NL Expect Found Chi-Sq
458 2 3 }
456 21 13 } 2.333
454 76 74 0.053
452 149 139 0.667
450 188 202 1.043
448 178 198 2.247
446 140 125 1.607
444 97 95 0.041
442 62 62 0.000
440 38 36 0.105
438 22 31 3.682
436 12 10 0.333
434 7 6 } 0.286
432 4 4 }
430 2 0 }
428 1 2 }
------
12.397
With 11 "degrees of freedom," this is a believable chi-square
value. So we proceed with a 10,000-cipher trial:
Chi-Square For Two 5-Bit Mixings Fig. 12
NL Expect Found Chi-Sq
460 1 1 }
458 22 20 } 0.043
456 208 190 1.558
454 763 722 2.203
452 1485 1487 0.003
450 1878 1863 0.120
448 1782 1780 0.002
446 1403 1453 1.782
444 974 961 0.174
442 622 618 0.026
440 376 378 0.011
438 218 243 2.867
436 124 137 1.363
434 68 63 0.368
432 37 45 1.730
430 19 17 0.211
428 10 8 } 0.474
426 5 10 }
424 2 3 }
422 1 0 }
420 1 1 }
------
12.935
With 15 "degrees of freedom," this is also a reasonable value.
So 26 more 10,000-cipher trials were performed, with the following
chi-square results:
Chi-Square Values for 10,000-Cipher Trials Fig. 13
6.768 7.984 8.054 9.007 11.247 12.225
12.464 12.583 12.901 14.293 15.170 15.876
16.458 16.848 17.842 19.691 19.973 20.375
20.571 21.956 23.108 24.753 24.773 25.709
28.069 33.837
The last value seems unreasonably high; either we got very unlucky,
or the data are telling us that the match is not perfect. Perhaps
we are seeing imperfections in the 5-bit tables the way we earlier
saw problems in the 4-bit tables. But most of the values are
very reasonable, and values like this do not repeatedly occur by
chance alone.
Conclusions
These initial experiments support an assertion that a two-level
Mixing cipher using tables of reasonable size can have a nonlinearity
distribution unexpectedly similar to the ideal block cipher. The
match seems sufficiently precise as to imply the existence of an
underlying mathematical relationship which is currently unknown.
We conclude that Mixing constructions do in fact produce nonlinearity
levels and distributions similar to those of an ideal cipher, despite
using much smaller and much weaker components. These experiments
thus support the Mixing approach to block cipher design, and do *not*
support the claim that ciphers with linear mixing are inherently weak.
Note that the ability to perform these experiments is based on the
"scalability" of the cipher design. This is an important advantage
that very few designs offer: Scalability supports far deeper
experimental analysis than is possible in a full-size cipher.
The approach to block cipher analysis taken in this article seems
to have not been reported previously in the open literature.
References and Bibliography
[AY82] Ayoub, F. 1982. Probabilistic completeness of
substitution-permutation encryption networks. IEE Proceedings,
Part E. 129(5): 195-199.
[DAE94] Daemen, J., R. Govaerts and J. Vandewalle. 1994.
Correlation Matrices. Fast Software Encryption. 275-285.
[FOR88] Forre, R. 1988. The Strict Avalanche Criterion:
Spectral Properties of Boolean Functions and an Extended
Definition. Advances in Cryptology -- CRYPTO '88. 450-468.
[GOR82] Gordon, J. and H. Retkin. 1982. Are Big S-Boxes Best?
Cryptography. Proceedings of the Workshop on Cryptography,
Burg Feuerstein, Germany, March 29-April 2, 1982. 257-262.
[HEY94] Heys, H. and S. Tavares. 1994. On the security of the
CAST encryption algorithm. Canadian Conference on Electrical and
Computer Engineering. Halifax, Nova Scotia, Canada, Sept. 1994.
332-335.
[HEY95] Heys, H. and S. Tavares. 1995. Known plaintext
cryptanalysis of tree-structured block ciphers. Electronics
Letters. 31(10): 784-785.
[MEI89] Meier, W. and O. Staffelbach. 1989. Nonlinearity Criteria
for Cryptographic Functions. Advances in Cryptology --
Eurocrypt '89. 549-562.
[MIR97] Mirza, F. 1997. Linear and S-Box Pairs Cryptanalysis
of the Data Encryption Standard.
[OC91] O'Connor, L. 1991. Enumerating nondegenerate permutations.
Advances in Cryptology -- Eurocrypt '91. 368-377.
[OC93] O'Connor, L. 1993. On the Distribution Characteristics in
Bijective Mappings. Advances in Cryptology -- EUROCRYPT '93.
360-370.
[PIE88] Pieprzyk, J. and G. Finkelstein. 1988. Towards effective
nonlinear cryptosystem design. IEE Proceedings, Part E.
135(6): 325-335.
[PIE89] Pieprzyk, J. and G. Finkelstein. 1989. Permutations
that Maximize Non-Linearity and Their Cryptographic Significance.
Computer Security in the Age of Information. 63-74.
[PIE89B] Pieprzyk, J. 1989. Non-linearity of Exponent Permutations.
Advances in Cryptology -- EUROCRYPT '89. 80-92.
[PIE93] Pieprzyk, J., C. Charnes and J. Seberry. 1993. Linear
Approximation Versus Nonlinearity. Proceedings of the Workshop on
Selected Areas in Cryptography (SAC '94). 82-89.
[PRE90] Preneel, B., W. Van Leekwijck, L. Van Linden, R. Govaerts
and J. Vandewalle. 1990. Propagation Characteristics of Boolean
Functions. Advances in Cryptology -- Eurocrypt '90. 161-173.
[RUE86] Rueppel, R. 1986. Analysis and Design of Stream Ciphers.
Springer-Verlag.
[XIO88] Xiao, G-Z. and J. Massey. 1988. A Spectral Characterization
of Correlation-Immune Combining Functions. IEEE Transactions on
Information Theory. 34(3): 569-571.
[YOU95] Youssef, A. and S. Tavares. 1995. Resistance of Balanced
S-boxes to Linear and Differential Cryptanalysis. Information
Processing Letters. 56: 249-252.
[YOU95B] Youssef, A. and S. Tavares. 1995. Number of Nonlinear
Regular S-boxes. Electronics Letters. 31(19): 1643-1644.
[ZHA95] Zhang, X. and Y. Zheng. 1995. GAC -- the Criterion for
Global Avalanche Characteristics of Cryptographic Functions.
Journal for Universal Computer Science. 1(5): 316-333.
---
Terry Ritter ritter@io.com http://www.io.com/~ritter/
The Crypto Glossary: http://www.io.com/~ritter/GLOSSARY.HTM
Terry Ritter, his current address, and his top page.
Last updated: 1997-12-28