From: (Terry Ritter)
Newsgroups: sci.crypt

Subject: Re: Toward Axiomatic Fenced DES (long!)
Date: 5 Jun 1994 14:09:22 -0500
Organization: UTexas Mail-to-News Gateway
Lines: 392
Message-ID: <>

 In <2sjjjg$> (Bryan G. Olson; CMSC (G))
 responds to my Fenced DES proof attempts:

>:  The Proofs
>Please don't call them theorems and proofs if they are not.

 Please don't nit-pick new work to death.  Few of these statements
 are actually false; in some cases the wording needs to be refined.
 Of course, some *are* defined so poorly that--upon reflection--they
 are probably not worth having.

 The concept of mathematical "proof" is subjective; we don't have
 a global automatic theorem-checker.  We can't just toss this
 stuff in a machine and have it decide yes or no and where the
 weakness lies.  If I had such a thing, my proofs would pass it.

 I obviously get complaints, but what, no complements?  No comments
 on my proofs of the example Block Mixing Transforms?  No complements
 for finding provable properties which can be used to expand the
 width of block ciphers as components?  If you have been following
 this process, you know it has involved a serious amount of effort,
 trial-and-error, and public failure.  Success was by no means
 assured.  When it occurs it should be uplifting to all of us.

>:  2.9  THEOREM:  (Average distance and distribution.)  The expected
>:  number of elements which differ between two n-position code values
>:  is n * (1 - 1/|S|), and the distribution is binomial.
>You need to state the distribution of code value pairs in order for
>the expected value you talk about to be defined.  Also, the proof
>needs work.  First justify the binomial distribution, then show
>this implies your result.

 In all cases I take the "expectation" to be the mean or arithmetic
 average over all possibilities, each one taken once.  Ironically,
 the critical comments themselves state:

  >When no
  >distribution is given, we generally assume the uniform distribution.

 Here, that means all possible code value pairs.

 As for the proof, first note that 2.7 is available:

> 2.7  THEOREM:  (Position difference counts.)   The number of
> n-position code values which differ in exactly m positions is
> (n)          m
> (m) * (|S|-1) .
>                         (n)
>      (Proof:  There are (m) combinations of m positions out of n
>      possible positions, and in any particular combination of m
>      positions each position can take on |S|-1 other symbols
>      producing (|S|-1)**m other code values for each combination.)

> 2.9  THEOREM:  (Average distance and distribution.)  The expected
> number of elements which differ between two n-position code values
> is n * (1 - 1/|S|), and the distribution is binomial.
>      (Proof:  Assume the number of code differences is the binomial
>                                             (n)   m   n-m
>      probability of a difference B(m;n,p) = (m)  p   q   , where
>      where p = 1 - 1/|S| and q = 1-p, times the total number of
>                       n
>      code values (1/q) :
>      (n)          m   (n)  m  n-m   -n
>      (m) * (|S|-1)  = (m) p  q     q
       ^                ^
       |                |
       This is 2.7      This is the binomial equation

>                       (n)            m
>                     = (m) (p / (1-p))
                    m                  m
             (|S|-1)  =     (p / (1-p))

              |S|-1   =      p / (1-p)

                      = (1-1/|S|) / (1 - (1-1/|S|))

                      = (1-1/|S|) / (1 - 1 + 1/|S|)

                      = (1-1/|S|) / (1/|S|)

                      = |S| (1-1/|S|)

                      = |S| - |S|/|S|

              |S|-1   = |S| - 1

                        Look, Ma!  They're the same!

 The proven expression from 2.7 is shown to be the binomial equation
 in drag.  This "justifies" the contention that the distribution is
 binomial.  The binomial expectation is np, which is n * (1 - 1/|S|).

>      which is correct, so the expected number of different elements
>      is the binomial expectation np.)

 This proof (2.9) looks good to me as it stands.

>:  3.2  DEFINITION:  A RANDOM discrete function allows each output
>:  code value to be selected independently for each possible input
>:  condition.
>Bad definition.  A function fixes the output value for each input
>value, so there is nothing to allow.  A random function is one chosen
>at random from some distribution of possible functions.  When no
>distribution is given, we generally assume the uniform distribution.

 Yes, I agree.  Moreover, since the whole point of the "random"
 thing was to eventually argue that an invertible substitution is
 not a random function (4.12), which is then never used, it is
 probably not worth having.  And this means that the whole of
 section 3 is probably not worth having.

>:  3.3  THEOREM:  (Number of random functions.)  There are 2**2n
>:  possible random functions with an n-bit binary input code and an
>:  n-bit binary output code.
>:       (Proof:  An n-bit binary code can make 2**n possible
>:       selections, each of which can be 2**n possible values,
>:       and (2**n)*(2**n) = 2**2n.)
>Given a function there is no way to tell if it is random (under your
>definition or mine).  A uniformly distributed random function from
>n-bit strings to n-bit strings is equally likely to take any of
>(2**n)**(2**n) different values.

 Let's just delete section 3 and also 4.12.

>:  4.7  THEOREM:  (Probable change propagation.)  Any change whatsoever
>:  to the input value to a complete invertible substitution is likely
>:  to change about half the bits in the output value.
>:       (Proof:  Changing the input value selects among all remaining
>:       output code values.  If the output is considered to be binary
>:       bits, we expect about half those bits to change (2.9).)
>Not true.  The identity function is a complete invertible permutation.
>I can give you many input changes which result in significantly less than
>half the output bits changing.
>You need some idea of probability here.

 I imply that in "expect."  I take the expectation to be the
 arithmetic average over all possibilities, each one taken once.

 It may be that one could be more explicit about the exact
 expectation, but an exact expectation is not what we normally
 mean by "avalanche."  For example, we do not see measurements
 which say "this construct achieves only 83.2 percent of the
 expected avalanche."

 With the understanding of the term "expect," I think the proof
 of 4.7 is satisfactory as it stands.

>For example:
>THEOREM: Given n-bit strings a and b with a <> b, if P(x) is chosen
>(independently of (x,y) ) from a uniform distribution of all
>permutations of the set of n-bit strings, then the expected Hamming
>weight of P(a) xor P(b) is:
>       2**(n-1) / (2**n -1)

 I specifically do not want to get into defining "choosing from
 a uniform distribution" since that implies a random choice, and
 then we get to define "random."  The proofs that ran into real
 problems did so over that word.  While this could be fixed, any
 such approach must actually be an indirect way of describing the
 expectation of an input distribution which I always take to be
 uniform.  Instead of defining a choice "at random," we can
 instead choose every possibility and report the results.

>:  4.8  DEFINITION:  AVALANCHE is a statistical property of a discrete
>:  function in which any change whatsoever on the input is expected to
>:  produce a change in about half the bits in the output value.
>This one is going to be hard to fix.

 It's a *definition*: there is nothing to fix.

 I suppose we could be more explicit in defining the expectation,
 but a particular explicit expectation is not the conventional
 quality one associates with avalanche.

 Maybe someone has a suggestion for an alternative definition of
 "avalanche," which might well affect the subsequent proofs.  Right
 now, I see no reason to change it (4.8).

>For any permutation on n-bit
>strings (n>0) there exists some input change which will flip just one
>output bit.  Proof: for permutation P use the change:
>            P ; flip-first-bit ; inverse-P.

 Yes, I would say this is known.  For a complete 8-bit substitution,
 there are in fact eight.  See 2.8:

> 2.8  EXAMPLE:  The number of 8-bit binary codes which differ in m
> bits is:
>      distance     count
>            0         1
>            1         8
                       ^--------- see!
>            2        28
>            3        56
>            4        70
>            5        56
>            6        28
>            7         8
>            8         1
>                    ---
>                    256 = 2**8
>      (Comment:  There are 256 8-bit binary code values, and 255
>      values which differ in at least one position from any
>      particular code value.)

>:  4.9  THEOREM:  (Avalanche is automatic.)  Avalanche is an inherent
>:  property of complete invertible substitution.
>:       (Proof:  See 4.5, 4.7, and 2.9.)
>Unfortunately 4.7 is incorrect, and the definition of avalanche is

 The *proof* of 4.7 may be insufficient, but the *fact* of 4.7 *is*
 correct.  These are not just arbitrary abstractions, but instead
 statements about specific reality.  The issue is to write them so
 they will be accepted on an abstract level.

 Similarly, it may well be that the definition of avalanche could
 be improved, but the current ad hoc definition is good enough to
 understand, and is consistent with the usual definition.

 This (4.9) looks OK to me.

>4.10 and 4.11 again need to be defined for a probability distribution.

 I take "expectation" to be the arithmetic average over all
 possibilities; that *is* a distribution.  These (4.10, 4.11) seem
 OK to me.

>:  4.12  THEOREM:  (Not a random function.)  An invertible substitution
>:  cannot be a random function.
>:       (Proof:  Suppose a value is selected for placement somewhere
>:       in a substitution.  Since an invertible substitution cannot
>:       allow another occurrence of that same value, other values
>:       cannot be selected independently.)
>Nope.  As much as it's defined it's incorrect.
>If a function f(x) is chosen from a uniform distribution of all
>functions from n-bit strings to n-bit strings, the probability that
>f(x) is an invertible substitution (permutation) is:
>     (2**n)! / (2**n)**(2**n).

 I suggest we just throw out 4.12.

>:  4.14  THEOREM:  (Reconstruction requires information linking output
>:  values to input values.)  An unknown invertible substitution cannot
>:  be resolved without simultaneous information about both the input
>:  value or position and the output value.
>:       (Proof:  To the extent that a particular substitution can
>:       be said to have an identity, that identity is the relation
>:       between substitute values and their position.  This relation
>:       is both necessary and sufficient to define the substitution.)
>Say what?

 I *intend* to say that a substitution is exclusively a relation
 between input and output values.  If we need to resolve a
 particular substitution (out of all possible invertible
 substitutions), we must have simultaneous information about both
 the input values and the output values.  In this way I try to
 extend the definition of substitution to imply a limit on

 There is a temptation to say that actual values must be known,
 which is probably wrong, and which I avoid.

 I'm not sure what else would be clearer here, but perhaps 4.14
 could be re-worded.

>:  5.2  THEOREM:  An invertible substitution is a bit-mixer.
>May or may not be.

  >5.1  DEFINITION:  A BIT-MIXER combines multiple input bits such
  >that each output value is defined by each and every input bit.

 OK, you might like "function of" or a more explicit rendering.
 Still, the point is clear.  The output value from a substitution
 is "dependent upon" or "a function of" each and every input bit.

 However, upon reflection, I think the definition is insufficient
 to capture my real intent, so this is something I will have to
 work on.  Note that the entire purpose for this line of reasoning
 is reflected in the comment of 12.2:

>      (Comment:  DES with a known key is an example of a block
>      mixing transform with absolutely no strength at all by itself,
>      which nevertheless adds strength through bit mixing.)

 which is significant because of previous questions of whether a
 mixer without any "strength" can have any cryptographic advantage.
 This development also justifies the examination of Block Mixing
 Transforms which have no "strength" but nevertheless mix.

>You need to be much more careful in your definitions and proofs.

 Probably.  It would be nice to be able to write proofs which would
 convince mathematicians, but my main goal is to express reality.
 If the concepts are true, they can be placed in a good format.

>afraid if you clean up the early stuff, it won't be strong enough to
>prove the later stuff.  You can show bit-change probabilities for
>random permutations, but then you can't apply them your block
>operations without making unproven assumptions.

 I don't think things are nearly that bad.

 I prove the bit-change distribution of an invertible substitution
 over all possible input changes, something which might be considered
 fairly obvious, bit I don't see it in the texts.

 My principle assumption is that an invertible substitution is the
 ideal block cipher, and this is consistent with the usual analysis.
 If an ideal block cipher is an invertible substitution, it must
 behave like an invertible substitution, and have the same bit-change
 distribution.  And, to the extent that the real block cipher
 approaches the ideal, it must behave in the same way.

 I also give specific proofs for the example Block Mixing Transform
 which allow it to be included in the reasoning.

 Given this development, we can reason about the output distribution
 over all possible inputs at every level of the Fenced DES cipher.
 Then we can reason about the combination of these levels in the
 final cipher, with results that show the essence of a block cipher.

 At this point, I claim a block cipher is a block cipher because it
 is invertible and avalanches every input bit.  The first is assumed
 by construction, and the second is something we can measure (to
 some extent).  That's all I know a block cipher must be.  If I had
 other requirements for a block cipher, then I could try to prove
 those as well.