Efficient FFT-Style Mixing for Block Cipher Cryptography

Mixing Large Blocks with Byte-Size BBM's

A Ciphers By Ritter Page

Terry Ritter

For those who have not previously encountered FFT-style mixing, we present description, figures, and routines in both Pascal and C for a straightforward and efficient mixing pattern. We propose this as the standard pattern for large-block mixing. This pattern can be dynamically varied to cover arbitrary power-of-2 size blocks at ciphering time.

Source code is presented for bona fide informational or experimental purposes only: The mixing mechanism is protected by patent.

Mixing Two Elements

We start out by picturing the mixing of two elements by a single Balanced Block Mixing or BBM. In many cases this will take two byte values into two different byte values.

[Fig. 1]

In the figure we have two vertical blue lines which just represent two element values flowing down through time. When a line is bare, the previous value is unchanged, but a yellow diamond indicates a value change.

Here there are two yellow diamonds, one on each line at the same time, connected by a broad red line between them. The yellow diamonds indicate a change in each value at a particular point in time and the red line connects the two parts of a single Balanced Block Mixing pair.

If we are mixing just two elements, there really is no wide range of possibilities -- we just mix one with the other. But when we have we have more elements, we have more possibilities.

Mixing Four Elements

Next we consider mixing four elements, again using the fundamental two-element BBM mixing. Now we need four mixing operations, where before we only needed one. Although we might wish for a larger fundamental mixing operation which we would apply only once, we are in fact developing the internal process of making just such an operation.

[Fig. 2]

If we have four elements to be mixed as pairs, one way to start is by mixing the first two elements and also mixing the second two elements. In this way, each of the first two element values is made a balanced result of both of the first two inputs, and each of the second two elements is made a balanced result of the second two inputs. In hardware, these mixing operations can occur simultaneously, and constitute what I call a mixing sublevel.

The point of the mixing is to end up with values such that each result value has been determined by each input value. We can do this by mixing an element from the first pair with an element from the second pair, and similarly mixing the remaining element from each pair, in a second sublevel.

Here, the selection of what to mix with what can be made in exactly two different ways, and if we just want some balanced mixing, there is no particular reason to choose one over the other. But some mixing patterns probably are a little more transparent than others, and some may be a little easier to automate.

Mixing Eight Elements

Next we consider mixing eight elements, again using the fundamental two-element BBM mixing. Now we need 3 sublevels of 4 mixing pairs each. Again note that, in hardware, the 4 mixings in any sublevel can proceed simultaneously.

[Fig. 3]

To develop a general algorithmic description of this process it is reasonable to introduce the concept of a group of pairs. In the first mixing sublevel, we have only one pair in each of four groups. But in the second sublevel, we have two pairs, and two groups. And in the third sublevel, we have four pairs, and only one group.

We distinguish pairs and groups because they involve different forms of indexing through the array of data. Pairs start out separated by a value which is the number of pairs in each group at that sublevel. In the first sublevel, there is always one pair per group, and the first pair in the first group is always [0,1]. Then we step each element by one and, if we have done all pairs for that sublevel (and we always will have, in the first sublevel), we step to the next group.

Once we have done a pair, we need to step across the width of the data affected by the pair to the next undone element. But if we have just done the previous group (and stepped beyond it), all we have to do is increment each element by the number of pairs per group for this sublevel. At the top sublevel, we have one pair per group, and so [1,2], the result after pair stepping, becomes [2,3], which is the next pair. Similarly, we find [4,5] and [6,7].

In the second sublayer, we have two pairs per group, so we start out with [0,2], which directly steps to the next pair in that group as [1,3]. At the end of the first group we have [2,4] which is transformed to [4,6] as the first pair of the second group. Then we find [5,7] as the second pair of the second group.

In the third sublayer, we have four pairs per group, and just one group. So we start out with [0,4] and step to [1,5], then [2,6] and [3,7], at which time we have done the pairs, groups, and sublayers and so are done.

Pascal Mixing Routine

Here we show a routine in Pascal for mixing blocks of arbitrary power-of-2 size as selected at ciphering time. Dynamic size selection supports, for example, the general use of large (say, 64 byte) blocks with an end-of-message step down using at most one of each lesser block size until we reach "legacy" 8-byte (DES-size) blocks.

Note that the simple Balanced Block Mixing computations are included. Here we use the degree-8 polynomial 01a9h.


      PROCEDURE ByFFTmix( VAR DatByAR; byct: WORD );
         { FFT-style byte mixing of 2**n contiguous bytes }
         { undo ByFFTmix with ByFFTmix; a self-inverse }
         VAR
            dat: ByteArray ABSOLUTE DatByAR;
            pairs, groups, el1, el2, group, pair, t: WORD;
         BEGIN

         groups := byct DIV 2;
         pairs := 1;

         WHILE (groups > 0) DO
            BEGIN

            el1 := 0;
            el2 := el1 + pairs;
            FOR group := 0 TO PRED(groups) DO
               BEGIN
               FOR pair := 0 TO PRED(pairs) DO
                  BEGIN

                  t := dat[el1] XOR dat[el2];  {  a +  b }
                  t := t + t;                  { 2a + 2b }
                  IF ((t AND $100) <> 0) THEN
                     t := t XOR $1a9;          { 2a + 2b - p }
                  dat[el1] := dat[el1] XOR t;  { 3a + 2b }
                  dat[el2] := dat[el2] XOR t;  { 2a + 3b }

                  Inc(el1);  Inc(el2);
                  END;
               Inc(el1,pairs);
               Inc(el2,pairs)
               END;

            groups := groups DIV 2;
            pairs := pairs + pairs;
            END;

         END; {ByFFTmix}

C Mixing Routine

Here we show the same routine in C code for mixing blocks of arbitrary power-of-2 size as selected at ciphering time.

Note that the simple Balanced Block Mixing computations are included. Here we use the degree-8 polynomial 01a9h. The BYTE and WORD defines are as one might expect.


/*
 * FFT-style byte mixing of 2**n contiguous bytes
 *
 */
void
ByFFTmix( BYTE dat[], WORD byct )
{
   WORD  pairs, groups, group, pair;
   BYTE  *d1, *d2, t;

   groups = byct >> 1;
   pairs = 1;

   while (groups)
      {

      d1 = dat;
      d2 = d1 + pairs;

      for (group = 0; group < groups; group++)
         {

         for (pair = 0; pair < pairs; pair++)
            {

            t = *d1 ^ *d2;

            if (t & 0x80)
               t = (t + t) ^ 0xa9;
            else
               t = t + t;

            *d1++ ^= t;
            *d2++ ^= t;
            }

         d1 += pairs;
         d2 += pairs;
         }

      groups >>= 1;
      pairs <<= 1;
      }
}

Summary

We have developed a description and presented working routines in Pascal and C for a particular efficient FFT-style mixing pattern. This mixing pattern is used to extend byte-wide mixing across arbitrary power-of-2 size blocks.

It is also possible to consider patterns which may be re-used without change in each sublevel. But the major issue in a Mixing design is likely to be table storage, rather than mixing. Consequently, having only a single mixing sublayer may not have much of an effect on overall cost, but will constrain throughput.

There may be slightly more efficient ways to code this standard mixing pattern. But as it stands it does drop into 80x86 assembly language fairly well. Unfortunately, byte-level processing is not a big advantage of 32-bit code.

Terry Ritter, his current address, and his top page.

Last updated: 1997-06-09