Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function, and thus is a measure of one kind of cipher strength. By measuring the nonlinearity of Feistel constructions with various numbers of rounds and tables, we hope to gain insight into how Feistel mixing compares to other alternatives.
From: ritter@io.com (Terry Ritter)
Newsgroups: sci.crypt
Subject: Measured Nonlinearity in Feistel Constructions (LONG!)
Date: Thu, 26 Feb 1998 06:39:28 GMT
Lines: 846
Message-ID: <34f50e19.599891@news.io.com>
MEASURED NONLINEARITY IN FEISTEL CONSTRUCTIONS
Terry Ritter
Ritter Software Engineering
http://www.io.com/~ritter/
1998-02-25
Abstract
Nonlinearity is the number of bits which must change in the truth
table of a Boolean function to reach the closest affine function,
and thus is a measure of one kind of cipher strength. By measuring
the nonlinearity of Feistel constructions with various numbers of
rounds and tables, we hope to gain insight into how Feistel mixing
compares to other alternatives.
Introduction
The ideal block cipher is a keyed simple substitution table of
sufficient size. Unfortunately, with 128-bit blocks, there would
be 2**128 entries in that table, which is completely out of the
question. So the modern block cipher is a *construction* intended
to *simulate* a keyed substitution of the desired size. At issue
is the effectiveness of the construction technology. One way to
investigate this is by using Boolean function theory, since a
substitution table -- or cipher -- can be considered a set of
independent Boolean functions, one for each output bit.
A Boolean function produces a single-bit result for each possible
combination of values from perhaps many Boolean variables. The
*nonlinearity* of a Boolean function is the Hamming distance to
the closest affine function [e.g., PIE88, PIE89, PIE89B]. (Also
see:
http://www.io.com/~ritter/ARTS/MEASNONL.HTM
). That is, nonlinearity is the number of bits which must change
in the truth table of a Boolean function to reach the closest affine
function. If "linearity" is considered a significant cryptographic
weakness, nonlinearity is an explicit measure of the *lack* of that
weakness. So nonlinearity measures one form of cipher "strength."
For cryptographic purposes, it is desired to take the nonlinearity
of a substitution table to be the *minimum* of the nonlinearity
values for each output bit in that table. Nonlinearity is measured
by forming the (one-bit-wide) truth table for a particular output
bit, then performing a Fast Walsh-Hadamard Transform (FWT) on that
array. Each result value is essentially a correlation count to a
particular affine function, and the minimum distance (the maximum
correlation) is found by scanning the transform results.
In measuring nonlinearity, it is generally necessary to record the
function result for each possible combination of input variables.
If we have an "8-bit" table, we must record and then transform
256 elements, and if we have a "16-bit" table, we must record and
transform 64K elements. Thus, measuring large functions rapidly
becomes impossible. So, although we cannot hope to measure the
nonlinearity of a real 64-bit or 128-bit block cipher, we *can*
measure nonlinearity in substitution tables and small block
constructions. Here we deal with 5-bit tables and the resulting
10-bit blocks.
Ideal Nonlinearity Distributions
By properly shuffling tables using a large-state cryptographic RNG
(in particular:
http://www.io.com/~ritter/KEYSHUF.HTM
) we can sample or construct a presumably uniform distribution of
different table arrangements. We can measure the nonlinearity of
each table, and accumulate the results. For 5-bit tables, we get
a nonlinearity distribution something like that shown in Figure 1.
Nonlinearity Distribution in 5-Bit Tables Fig. 1
0.7 | *
0.6 | *
0.5 | *
0.4 | *
0.3 | *
0.2 | * *
0.1 | * * *
0.0 | * * *
Prob +--+--+--+--+--+--+--
0 2 4 6 8 10 Nonlinearity
Our block cipher construction problem essentially involves using
small random tables (here 5-bit tables with 32 entries and a mean
nonlinearity of 7.8) to somehow emulate a table which is twice as
wide (here a 10-bit table with 1024 entries and a mean nonlinearity
of 447.7).
For 10-bit tables, we get a nonlinearity distribution
something like that in Figure 2.
Nonlinearity Distribution in 10-Bit Tables Fig. 2
0.2 |
0.175 | * *
0.15 | * * *
0.125 | * * * *
0.1 | * * * * *
0.075 | * * * * * *
0.05 | * * * * * * * *
0.025 | * * * * * * * * * *
0.00 | * * * * * * * * * * *
Prob +--+--+--+--+--+--+--+--+--+--+--+--+--
436 440 444 448 452 456 Nonlinearity
Feistel Mixing
The classic Feistel construction of Figure 3 is a way of mixing
two small blocks into a large block. Here, + is the exclusive-OR
function. As used in the U.S. Data Encryption Standard (DES),
L and R are each 32 bits; here we have only 5 bits in each.
Feistel Construction Fig. 3
L R
| |
|--> F --> + round 1
| |
+ <-- F <--| round 2
| |
v v
L' R'
In DES, ideal properties for F are much-discussed topics. The
Feistel construction itself supports the ability to encipher and
decipher using a random function F, not necessarily a permutation.
But the 8 different "S-boxes" used in DES *are* permutations, if
we see each as a set of 4 permutations per box, so using random
permutations in Feistel mixing is not at all unreasonable. And by
using permutation tables as components we have a single compatible
measure both for the component tables and the simulated larger table,
and thus can draw conclusions about the construction. Invertible
tables also support a direct comparison to earlier work on
nonlinearity with respect to Mixing and VSBC constructions.
Unlike the earlier work, the decision to use random invertible
substitutions for function F still leaves us with a couple of
remaining parameters: The number of rounds, and the number of
tables. Here we investigate the mixing obtained from the Feistel
construction with two variations: The use of a single table for
various numbers of rounds, and the use of a new random table on
every other round, for various numbers of rounds.
Feistel Mixing and DES
Unfortunately, the Feistel construction is *not* the only form
of mixing found in DES. Indeed, the practicality of the Feistel
construction depends upon having (that is, *constructing*) a wide
function F. And while F *might* be realized as a recursive
sequence of smaller Feistel constructions, any such structure
would be impractically slow. So the very existence of practical
Feistel ciphers depends upon the construction of a random function
F which itself mixes and diffuses information across its width.
In each DES "S-box," the outside input bits (b0 and b5) -- which
are set by expansion "E" -- just select which of the 4 boxes to use.
This is a data-dependent dynamic table selection which is, in itself,
a form of mixing. Of course, the simple combination of expansion "E"
and the array of "S-boxes" by themselves probably would not produce
good mixing: A single bit-change at one end of the input block
could propagate no faster than one table per round, *if* all data
were kept aligned. But this is not the case, since "Permutation P"
transposes the data bits. So, other than the dynamic table selection
(the two extra inputs for each S-box), function F is itself a classic
substitution-permutation design.
This means that the overall mixing in DES is a complex combination
of the Feistel construction *plus* the expansion, table-selection,
table lookup and permutation in each round. This complexity makes
it difficult to independently extract the importance and consequences
of each feature, as we surely must if we are to deeply understand
the design. This confluence of multiple features also makes it
impossible to scale DES to a tiny testable size. Thus, we seem
to be limited to testing a tiny Feistel construction -- which *is*
scalable -- instead of a tiny DES.
Feistel Mixing with Random Tables
A common assumption about Feistel mixing is that some fixed small
set of optimized tables will be used repeatedly. This concept
presumably comes from DES. But in the special context of DES:
1. DES S-boxes each contain 4 separate *permutations*,
and are *not* random constructions; and
2. The DES permutations are only 4 bits wide, and 4-bit-wide
permutations are just about the weakest possible tables
that have any nonlinear strength at all.
Accordingly, it should come as no surprise that replacing DES
S-boxes with random tables is not a good idea. And random 4-bit
permutations are often weak, so this is *also* not a good idea.
In marked contrast, random 8-permutations are almost *never* that
weak (the chance of finding even one linear output is 1 in 2**239).
It thus appears that the assumption that random tables are
necessarily bad is based on the special context of DES, and is not
valid in general.
Comparisons to Mixing and VSBC Constructions
Earlier nonlinearity measurement experiments on tiny Mixing (
http://www.io.com/~ritter/ARTS/MIXNONLI.HTM
) and VSBC (
http://www.io.com/~ritter/ARTS/VSBCNONL.HTM
) constructions also used random tables, thus potentially making
it possible to see how Feistel mixing compares.
Unfortunately, the comparison is not exact, because both the Mixing
and VSBC constructions do in some sense mix multiple tables in a
single mixing level. With Feistel mixing, we would normally think
of multiple "rounds" of mixing, rather than having several tables
in each round. DES does of course manage this, but does not give
us a scalable rule which we could use to build a tiny version with
multiple tables. And even in the best possible situation we would
be looking at the experimental measurement of a 16-bit simulated
table, for which it would be very difficult to collect an accurate
ideal distribution.
So, given the limitations of reality, the first thing we might want
to look at is the nonlinearity measure from Feistel mixing with a
single table, under a varying the number of rounds. Then we might
look at using multiple tables, again for a varying number of rounds.
The Experiments
We construct 10,000 random 5-bit tables, which are then used in
10-bit Feistel constructions. Each of these constructions is
measured for nonlinearity, and the result accumulated.
While Figure 2 gives a general idea of the desired distribution, we
eventually want to use a chi-square comparison, for which we need
an ideal reference distribution. The ideal model in Figure 4 is
thus constructed from three different 15-hour trials of 1,000,000
random 10-bit tables each, with the counts reduced by a factor of
100 and rounded to an integer.
The Ideal 10-Bit Nonlinearity Distribution Fig. 4
NL Trial 1 Trial 2 Trial 3 Ideal
462 0 1 0 0
460 55 75 55 1
458 2145 2269 2234 22
456 20537 20938 20882 208
454 76259 76344 75980 763
452 148847 148135 148510 1485
450 188329 187802 187585 1878
448 177775 178208 178616 1782
446 140283 140219 140407 1403
444 97415 97317 97465 974
442 61999 62282 62213 622
440 37829 37685 37382 376
438 21675 21789 21753 218
436 12417 12320 12408 124
434 6826 6818 6912 68
432 3740 3693 3584 37
430 1912 2050 1884 19
428 979 1043 1073 10
426 490 537 526 5
424 249 241 265 2
422 119 133 143 1
420 64 55 69 0
418 26 25 30 0
416 12 14 12 0
414 9 5 6 0
412 3 2 4 0
410 0 0 1 0
In each experimental trial, we first initialize the substitutions
from a random key. This produces a particular 10-bit cipher -- a
simulated 10-bit simple substitution. We then traverse the complete
transformation -- all 1024 possible data values -- and collect all
1024 10-bit ciphertext results. We then traverse all 10 ciphertext
bit-columns, one-by-one, and perform a nonlinearity computation on
each. The minimum nonlinearity value found across all columns is
accumulated into the growing distribution.
We see the results for 2 rounds of Feistel mixing in Figure 5.
Feistel Nonlinearity Distribution
One Table, Two Rounds Fig. 5
NL Expect Found
64 0 2
128 0 78
192 0 1494
256 0 7387
320 0 1039
Clearly, 2 Feistel mixing rounds do not approximate the nonlinearity
distribution from random substitutions, so we move on to 4 rounds
as in Figure 6.
Feistel Nonlinearity Distribution
One Table, Four Rounds Fig. 6
NL Expect Found
224 0 1
312 0 56
320 0 1
344 0 1
352 0 35
364 0 1
368 0 8
384 0 696
388 0 3
392 0 187
396 0 2
400 0 11
404 0 15
406 0 3
408 0 13
410 0 2
412 0 38
414 0 2
416 0 1709
418 0 4
420 0 66
422 1 11
424 2 126
426 5 13
428 10 256
430 19 20
432 37 780
434 68 51
436 124 660
438 218 109
440 376 3566
442 622 70
444 974 589
446 1403 91
448 1782 639
450 1878 43
452 1485 107
454 763 12
456 208 2
458 22 1
460 1 0
462 0 0
Again, the is pretty brutal: So even four Feistel mixing rounds do
not approximate random substitutions. And neither do six (Figure 7),
eight (Figure 8), ten (Figure 9) or twelve (Figure 10):
Feistel Nonlinearity Distribution
One Table, Six Rounds Fig. 7
NL Expect Found
296 0 1
316 0 1
324 0 1
332 0 1
336 0 2
352 0 1
372 0 1
384 0 2
388 0 3
392 0 1
394 0 1
396 0 2
398 0 1
400 0 3
402 0 1
404 0 10
406 0 0
408 0 23
410 0 2
412 0 22
414 0 6
416 0 22
418 0 14
420 0 46
422 1 25
424 2 85
426 5 21
428 10 114
430 19 70
432 37 192
434 68 138
436 124 386
438 218 269
440 376 641
442 622 662
444 974 1197
446 1403 1238
448 1782 1612
450 1878 1394
452 1485 1151
454 763 492
456 208 133
458 22 13
460 1 0
462 0 0
Feistel Nonlinearity Distribution
One Table, Eight Rounds Fig. 8
NL Expect Found
346 0 1
370 0 1
372 0 1
402 0 2
404 0 3
406 0 3
408 0 6
410 0 1
412 0 6
414 0 10
416 0 17
418 0 18
420 0 30
422 1 36
424 2 38
426 5 53
428 10 74
430 19 95
432 37 137
434 68 194
436 124 261
438 218 406
440 376 519
442 622 808
444 974 1080
446 1403 1365
448 1782 1569
450 1878 1527
452 1485 1063
454 763 519
456 208 136
458 22 21
460 1 0
462 0 0
Feistel Nonlinearity Distribution
One Table, Ten Rounds Fig. 9
NL Expect Found
362 0 1
396 0 1
398 0 2
400 0 1
402 0 1
404 0 0
406 0 4
408 0 3
410 0 10
412 0 10
414 0 11
416 0 28
418 0 16
420 0 25
422 1 39
424 2 63
426 5 63
428 10 78
430 19 116
432 37 155
434 68 247
436 124 314
438 218 380
440 376 584
442 622 811
444 974 1034
446 1403 1353
448 1782 1556
450 1878 1428
452 1485 1046
454 763 480
456 208 126
458 22 14
460 1 0
462 0 0
Feistel Nonlinearity Distribution
One Table, Twelve Rounds Fig. 10
NL Expect Found
398 0 1
400 0 3
402 0 2
404 0 2
406 0 6
408 0 4
410 0 8
412 0 12
414 0 13
416 0 12
418 0 30
420 0 23
422 1 34
424 2 64
426 5 62
428 10 96
430 19 113
432 37 179
434 68 234
436 124 327
438 218 462
440 376 597
442 622 797
444 974 1054
446 1403 1347
448 1782 1467
450 1878 1450
452 1485 1011
454 763 450
456 208 128
458 22 12
460 1 0
462 0 0
The twelve rounds with a fixed table of Figure 10 is about as good
as it gets. The distribution from a single table never gets close
enough to the ideal distribution to make a chi-square computation
reasonable.
Feistel Mixing with Random Multiple Tables
On the other hand, things improve if we have a new table every couple
of rounds (Figure 11). (There would seem to be little advantage in
having a unique table for each round, since only half of the block
would be affected by that table.)
Feistel Nonlinearity Distribution
Two Tables, Four Rounds Fig. 11
NL Expect Found
320 0 3
352 0 39
360 0 2
364 0 1
368 0 6
382 0 1
384 0 156
386 0 1
388 0 3
390 0 0
392 0 225
394 0 1
396 0 4
398 0 1
400 0 6
402 0 1
404 0 13
406 0 3
408 0 22
410 0 4
412 0 42
414 0 2
416 0 1951
418 0 3
420 0 81
422 1 5
424 2 143
426 5 13
428 10 276
430 19 29
432 37 853
434 68 47
436 124 788
438 218 90
440 376 3175
442 622 98
444 974 805
446 1403 130
448 1782 833
450 1878 33
452 1485 95
454 763 10
456 208 5
458 22 1
460 1 0
462 0 0
While hardly ideal, we can see that the results in Figure 11
are somewhat better than the same number of rounds with a single
table (Figure 6). But things perk up rather nicely with three
tables and six rounds as shown in Figure 12.
Feistel Nonlinearity Distribution
Three Tables, Six Rounds Fig. 12
NL Expect Found Chi-Sq DF
420 0 0 }
422 1 3 }
424 2 3 }
426 5 3 }
428 10 10 } 0.056 0
430 19 13 1.950 1
432 37 40 2.194 2
434 68 59 3.385 3
436 124 123 3.393 4
438 218 200 4.879 5
440 376 381 4.946 6
442 622 633 5.140 7
444 974 990 5.403 8
446 1403 1390 5.523 9
448 1782 1780 5.526 10
450 1878 1952 8.441 11
452 1485 1441 9.745 12
454 763 754 9.851 13
456 208 208 9.851 14
458 22 17 }
460 1 0 }
462 0 0 } 11.417 15
With 15 "degrees of freedom," the chi-square value of 11.417 in
Figure 11 is very believable. The critical chi-square values for
DF = 15 are given in Figure 13.
Chi-Square Critical Values, for DF = 15 Fig. 13
1% 5% 25% 50% 75% 95% 99%
5.2293 7.2609 11.0365 14.3389 18.2451 24.9958 30.5779
And another couple of rounds with yet another table continues the
success story in Figure 14.
Feistel Nonlinearity Distribution
Four Tables, Eight Rounds Fig. 14
NL Expect Found Chi-Sq DF
410 0 1 }
412 0 0 }
414 0 0 }
416 0 0 }
418 0 1 }
420 0 2 }
422 1 2 }
424 2 1 }
426 5 4 }
428 10 10 } 0.500 0
430 19 19 0.500 1
432 37 30 1.824 2
434 68 65 1.957 3
436 124 119 2.158 4
438 218 222 2.232 5
440 376 375 2.234 6
442 622 634 2.466 7
444 974 982 2.532 8
446 1403 1365 3.561 9
448 1782 1750 4.135 10
450 1878 1928 5.467 11
452 1485 1504 5.710 12
454 763 750 5.931 13
456 208 212 6.008 14
458 22 23 }
460 1 1 }
462 0 0 } 6.052 15
While the chi-square value of 6.052 may seem small, this means the
distribution is close to the ideal. When comparing distributions,
we are rarely worried about a chi-square value that seems "too
good," for the alternative is far *less* believable: Either we
have had the good luck of sampling a close distribution so it looks
better than it should, or we have had the absolutely unbelievable
luck of sampling a fundamentally different distribution so it looks
not just good, but actually *too* good. Thus, these tests are
basically "one-tail" statistical tests.
Conclusions
Feistel mixing seeks to produce a block cipher which is double
the size of a smaller "block cipher" component; in this respect,
it is similar to the Mixing constructions measured earlier. When
using a single fixed table, it appears that Feistel mixing is
never good enough, no matter how many rounds are applied. But
even 6 rounds of Feistel mixing with 3 *different* tables
apparently does approximate the nonlinearity of a larger
substitution table.
These experiments show an ultimately successful result using random
invertible tables. Thus, the experiments give no indication that
Feistel mixing requires the ideal tables that so much of the
literature seems directed at producing. In contrast to the unending
search for the qualities of an ideal table, it appears that we can
produce an effective Feistel cipher by choosing tables "at random"
(that is, we can construct tables based on the cipher key). Note
that Differential and Linear Cryptanalysis style attacks generally
depend upon knowledge of the contents of the tables, knowledge
which is unavailable when keyed tables are used.
When compared to the Mixing and VSBC constructions investigated
previously, the problem with Feistel mixing is that, by itself, it
only doubles the size of the nonlinear component, and we assume
that repeated doubling requires exponential effort: That is, if
6 Feistel rounds are required to take random 8-bit tables to a
16-bit emulated table, 6 rounds of the emulation (36 total
rounds) should be required to get a 32-bit table, and 216 total
rounds should be required for a 64-bit cipher. (It would be
interesting to check these assumptions on modern fast equipment.)
Now, we know that DES (an emulated 64-bit table) does *not* require
exponential effort. But Feistel mixing is *not* the only form of
mixing in DES. We avoid addressing the other structures, because
they do not scale well, and because we wish to examine each
contribution as independently as possible.
Here we have measured the Feistel mixing alone, which we assume
requires effort exponential with the block size. In contrast,
Mixing constructions require only effort proportional to n log n,
and VSBC constructions require only effort linear with block size.
Thus, these alternative constructions offer the possibility of
far more efficient cryptographic mixing.
References and Bibliography
[AY82] Ayoub, F. 1982. Probabilistic completeness of
substitution-permutation encryption networks. IEE Proceedings,
Part E. 129(5): 195-199.
[DAE94] Daemen, J., R. Govaerts and J. Vandewalle. 1994.
Correlation Matrices. Fast Software Encryption. 275-285.
[FOR88] Forre, R. 1988. The Strict Avalanche Criterion:
Spectral Properties of Boolean Functions and an Extended
Definition. Advances in Cryptology -- CRYPTO '88. 450-468.
[GOR82] Gordon, J. and H. Retkin. 1982. Are Big S-Boxes Best?
Cryptography. Proceedings of the Workshop on Cryptography,
Burg Feuerstein, Germany, March 29-April 2, 1982. 257-262.
[HEY94] Heys, H. and S. Tavares. 1994. On the security of the
CAST encryption algorithm. Canadian Conference on Electrical and
Computer Engineering. Halifax, Nova Scotia, Canada, Sept. 1994.
332-335.
[HEY95] Heys, H. and S. Tavares. 1995. Known plaintext
cryptanalysis of tree-structured block ciphers. Electronics
Letters. 31(10): 784-785.
[MEI89] Meier, W. and O. Staffelbach. 1989. Nonlinearity Criteria
for Cryptographic Functions. Advances in Cryptology --
Eurocrypt '89. 549-562.
[MIR97] Mirza, F. 1997. Linear and S-Box Pairs Cryptanalysis
of the Data Encryption Standard.
[OC91] O'Connor, L. 1991. Enumerating nondegenerate permutations.
Advances in Cryptology -- Eurocrypt '91. 368-377.
[OC93] O'Connor, L. 1993. On the Distribution Characteristics in
Bijective Mappings. Advances in Cryptology -- EUROCRYPT '93.
360-370.
[PIE88] Pieprzyk, J. and G. Finkelstein. 1988. Towards effective
nonlinear cryptosystem design. IEE Proceedings, Part E.
135(6): 325-335.
[PIE89] Pieprzyk, J. and G. Finkelstein. 1989. Permutations
that Maximize Non-Linearity and Their Cryptographic Significance.
Computer Security in the Age of Information. 63-74.
[PIE89B] Pieprzyk, J. 1989. Non-linearity of Exponent Permutations.
Advances in Cryptology -- EUROCRYPT '89. 80-92.
[PIE93] Pieprzyk, J., C. Charnes and J. Seberry. 1993. Linear
Approximation Versus Nonlinearity. Proceedings of the Workshop on
Selected Areas in Cryptography (SAC '94). 82-89.
[PRE90] Preneel, B., W. Van Leekwijck, L. Van Linden, R. Govaerts
and J. Vandewalle. 1990. Propagation Characteristics of Boolean
Functions. Advances in Cryptology -- Eurocrypt '90. 161-173.
[RUE86] Rueppel, R. 1986. Analysis and Design of Stream Ciphers.
Springer-Verlag.
[XIO88] Xiao, G-Z. and J. Massey. 1988. A Spectral Characterization
of Correlation-Immune Combining Functions. IEEE Transactions on
Information Theory. 34(3): 569-571.
[YOU95] Youssef, A. and S. Tavares. 1995. Resistance of Balanced
S-boxes to Linear and Differential Cryptanalysis. Information
Processing Letters. 56: 249-252.
[YOU95B] Youssef, A. and S. Tavares. 1995. Number of Nonlinear
Regular S-boxes. Electronics Letters. 31(19): 1643-1644.
[ZHA95] Zhang, X. and Y. Zheng. 1995. GAC -- the Criterion for
Global Avalanche Characteristics of Cryptographic Functions.
Journal for Universal Computer Science. 1(5): 316-333.
---
Terry Ritter ritter@io.com http://www.io.com/~ritter/
The Crypto Glossary: http://www.io.com/~ritter/GLOSSARY.HTM
Terry Ritter, his current address, and his top page.
Last updated: 1998-02-26