All of the mixers referred to above are balanced block mixers which map multiple input code values to the same number of output code values.

Throughout this application, including the claims,
the term "balanced block mixer" is intended to include all
possible implementations (such as electronic hardware,
computer implemented software, hybrids, or the like) or the
process performed. Thus a balanced block mixer refers to
either a process for balanced block mixing or to any
implementation (for example, hardware, computer implemented
software, hybrid, or the like) for performing balanced block
mixing.

**FIGURES 9(a)** and **9(b)** show the operations performed
by balanced block mixers **174** and **176** in terms of their
four component combiners **178-184**. In the mechanisms of
**FIGURES 9(a)** and **9(b)**, balanced block mixer **174**
consists of combiners **178** and **180**, and balanced block
mixer **176** consists of combiners **182** and **184**.
A 2n-bit input message is split into two n-bit data blocks A and B
which are input to balanced block mixer **174**. Within the
balanced block mixer **174**, data blocks A and B are input to
combiners **178** and **180**. The two combiners **178**
and **180** produce two n-bit output data blocks X and Y. Data
blocks X and Y may then be further encrypted prior to transmission
or use. **FIGURE 9(b)** illustrates apparatus for recreating
input message I from data blocks X and Y. Upon receipt of the data
blocks X and Y (perhaps as a single 2n-bit data block), balanced
block mixer **176** (the inverse of balanced block mixer **174**)
takes two n-bit input data blocks X and Y and uses two combiners
**182** and **184** to produce two n-bit data blocks A and B
which are joined to reproduce the original input message.

Desirably, a balanced block mixer has the following properties:

- The mapping is one-to-one and invertible. That is, every possible input message to the mixer produces a different output, and every possible output from the mixer is produced by a different input message to the mixer.
- Each output of a mixer is a function of all inputs
to the mixer. That is, treating the inputs and
outputs to the mixer as values (e.g., binary
values), every output value is a function of all
input values. With reference to
**FIGURES 9(a)**and**9(b)**, output blocks X and Y from balanced block mixer**174**are each functions of both input blocks A and B. That is, the outputs (X and Y) of balanced block mixer**174**are both a function of the entire input message I. Similarly, output blocks (A and B) of balanced block mixer**176**are both functions of both input blocks (X and Y). - Changes Propagate to All Outputs. Any change to any
one of the input values to the mixer will change all
of the output code values of the mixer. For
example, with reference to
**FIGURES 9(a)**and**9(b)**, a change to any value in input data block A to balanced block mixer**174**will cause a change in both output data blocks X and Y. Similarly, a change to any value in input data block B will cause a change in both output data blocks. - Balance and Input Independence. Stepping any input
to the mixer through all possible values (while
keeping the other inputs fixed) will step every
output of the mixer through all possible values.
For example, with reference to
**FIGURES 9(a)**and**9(b)**, keeping the value of input B fixed and changing the value of input A through all possible values will cause outputs X and Y to each step though all possible values.

In the preferred embodiments, balanced block mixers
(performing block mixing transforms) are size preserving.
That is, the size of the output blocks is the same as the size
of the input blocks. With reference to **FIGURES 9(a)** and
**9(b)**, the sizes of data blocks X and Y are the same as the
sizes of data blocks A and B. Thus, for example, if the input
message I is 128 bits and A and B are 64-bit data blocks then
X and Y are also 64-bit data blocks.

Balanced block mixers with the above-described
properties can be obtained using orthogonal Latin squares.
Essentially a balanced block mixer is implemented as two
orthogonal Latin squares. Because the Latin squares are
orthogonal, their accumulated output is reversible. The
generation of orthogonal Latin squares is described in "The
Completely Orthogonalized Latin Square," Stevens, W. L.,
*Annals of Eugenics,* 9: 82-93, 1939, which is hereby
incorporated herein by reference. (A way of constructing
orthogonal Latin Squares is also described in "Direct
Construction of Mutually Orthogonal Latin Squares," Kull, H.
et al., *Computational Theory and Logic,* 1987.)'

A Latin square is a two-dimensional array or matrix of symbols, such that each row and each column contains each symbol exactly once. A Latin square is used to mix two values by using one value to select a row of the square, and the other to select a column of the square, and then reporting the value of the Latin square at that row and column. This Latin square mixing is balanced: for any particular value on one input, changing the other input produces every possible output symbol.

A known Latin square is invertible if one of the mixing inputs is known, which is the usual situation for a stream cipher.

The values from a pair of Latin squares can be
transformed to their original values *without outside
information*, provided the Latin squares are "orthogonal".
That is, each pair of output symbols is produced by just a
single selection of input values. (If two pairs of input
values produce the same output values, then it cannot be
determined which of those input pairs to choose when
deciphering.) Note that two exclusive-OR functions are not
orthogonal, and so cannot be used in this way.

Using a more formal notation, a Latin Square of order (size) m is
an m x m square array of the symbols S_{0}, S_{1},
. . ., S_{m-1} such that each row and each column are a
permutation of the symbols S_{0}, S_{1}, . . .,
S_{m-1}, and each symbol appears only once in each row and
in each column (clearly the m symbols can be replaced by m digits).
Thus, for example, when m is 3, the following is a Latin square:

B C A C A B A B C

Two Latin squares of the same size are said to be orthogonal to each other if, when the two squares are superimposed, each symbol of one square coincides exactly once with each symbol of the other square. For example, two Latin squares of size three with this property are:

B C A B C A C A B A B C A B C C A BWhen the above two Latin squares are superimposed (as shown below),

(B,B) (C,C) (A,A) (C,A) (A,B) (B,C) (A,C) (B,A) (C,B)each possible ordered pair of symbols occurs exactly once.

First, one method of constructing a Latin square is described. The Latin square L of size m, having the digits {0, 1, . . ., m-1} can be written as L[x, y] = (kx + y) mod m, where k is a non-zero constant and x and y are in the range 0 to m-1. That is, "(kx + y) mod m" is the number to be placed in the xth row and the yth column of the square. For example, a value of m=3 gives the following Latin square (only for the sake of notation, we denote "a mod b" as "a|b" herein):

0 1 2 k|3 (k+1)|3 (k+2)|3 2k|3 (2k+1)|3 (2k+2)|3For k=1, this gives the Latin square

0 1 2 1 2 0 2 0 1

The following construction derives from Stevens, cited above.

Assuming that a field of m numbers exists, a square S of the digits may be written as:

S = {u_{k}u_{x} + u_{y}}, where
u_{k} is a non-zero constant, and where u_{x},
u_{y} = 0, 1, u_{2}, . . ., u_{m-1}, where
u_{k}u_{x}+u_{y} is the number in row
u_{x} and column u_{y}. The numbers may have no
obvious order, but we may designate the m rows and columns by means
of the numbers 0, 1, u_{2}, . . ., u_{m-1} in any
way. Stevens shows that any two squares defined as above, for
different values of u_{k}, are orthogonal.

Clearly, for large values of n, the Latin squares of size
2^{n} are too large and unwieldy to be usefully generated
and stored. However, since the method of generating the squares
uses functions which can be easily computed, the actual Latin
squares need not ever be generated. In other words, whenever
a Latin square is referred to or needed, its generating
function can be used.

One way to create a finite field is to use a prime number. For
some prime number p, the values 0, 1, . . ., p-1, constitute a
finite field modulo p. However, for efficient computer
implementation, we would like to have a field of exactly
2^{n} elements, and 2^{n} is not prime for n
above 1. We might use primes close to 2^{n}, provided that
we can accept some unbalance, or we can use polynomial operations,
as discussed below. Polynomial operations modulo an irreducible
polynomial provide a way to generate fields of exactly 2^{n}
elements with exact balance and with efficient operation.

Given a Latin square LS of size m having as elements the numbers 0, 1, . . ., m-1, if a particular row value is fixed at say A, the value of LS[A, c] will take on all values in {0, 1, . . ., m-1} as c varies over all possible values 0, 1, . . ., m-1. Similarly, if a particular column is fixed at say B, the value of LS[r, B] will take on all values in {0, 1, . . ., m-1} as r varies over all possible values 0, 1, . . ., m-1. This corresponds to desired property number 4 for balanced block mixers, listed above.

A polynomial over S is an expression of the form

u(x) = uwhere the coefficients ui are elements of some algebraic system S, and the variable x may be regarded as a formal symbol with an indeterminate meaning._{n}x^{n}+ . . . u_{1}x + u_{0},

The algebraic system S is usually the set of integers, or the rational numbers. When S is the set of integers {0, 1, . . ., m-1}, with addition, subtraction, and multiplication performed mod m, this is called polynomial arithmetic modulo m (or mod m).

Thus, a mod p polynomial P is one of the form:

P(x) = pwhere the coefficients pi are integers in {0, 1, . . ., p-1} and the addition, subtraction and multiplication are performed mod p. A mod 2 polynomial P has all coefficients 0 or 1 and the addition, subtraction and multiplication are performed mod 2._{n}x^{n}+ . . . p_{1}x + p_{0}

Definitions of mathematical fields can be found in most basic
books on number theory. Some definitions are included here (from
Koblitz, *A Course in Number Theory and Cryptography,* 2nd
Edition, Springer-Verlag, 1994).

A *field* is a set with multiplication and addition
operations which are associative, commutative and
distributive. The set has an additive identity (zero), a
multiplicative identity (one), additive inverses and
multiplicative inverses for everything except zero. For
example, the set of all real numbers, R, is a field and the
set of integers modulo a prime number is a field.

The notion of a field extends to polynomials. As with integer fields (modulo some prime number), polynomial fields are described modulo a prime (irreducible) polynomial. For example, if P is a polynomial field modulo the prime polynomial p, we refer to P as a modulo p (or mod p) field. This essentially means that all arithmetic on polynomials in the field is performed modulo the prime polynomial p.

Polynomials being "mod 2, mod p" (as used herein) means that each polynomial is a mod 2 polynomial (i.e., has coefficients that are 0 or 1) and that the arithmetic over the polynomials is performed modulo p.

The following combiners, with their operations
performed modulo 2 and modulo p, for some irreducible mod 2
polynomial p of appropriate degree for the data blocks X, Y, A
and B (polynomial p need not be primitive, but must be
irreducible to generate a field), provide an example of a
balanced block mixers for balanced block mixer **174** and its
inverse balanced block mixer **176** of **FIGURES 9(a)**
and **9(b)**:

CombinerEach input value to a balanced block mixer is a binary number which corresponds to the coefficients of a mod 2 polynomial. So, in the above example, A, B, X, and Y are each considered as mod 2 polynomials. Then, the operations performed by the combiners, e.g., 2A+3B, etc., are first performed as mod 2 polynomials, and those results are reduced mod p (i.e., modulo the irreducible polynomial p). So, for example, "2A + 3B" above could also be written as "(2A + 3B) mod p"._{178}(A, B) = 3A + 2B, Combiner_{180}(A, B) = 2A + 3B, Combiner_{182}(X, Y) = 3X + 2Y, Combiner_{184}(X, Y) = 2X + 3Y.

The data transform performed by the balanced block mixers shown above is a self-inverse, has good mixing correlation properties, is statistically balanced, and has a processing cost which is linear with block size.

Using the data or signal transformation X = Combiner(A, B) = 3A + 2B; Y = Combiner(A, B) = 2A + 3B, in a balanced block mixer, any value change to either of the input signals A or B must be reflected in both the X and Y output signals. Suppose, for example, that some change C is made to the input signal A:

X = 3A + 2B (mod 2, mod p) X' = 3(A+C) + 2B X' = 3A + 3C + 2B dX = X' - X = 3Cbut 3C is non-zero (thus affecting the output signal) for any signal change C which is not zero, and if C is zero, there has been no change to the input signal A. Suppose some change C is made to input signal B:

X = 3A + 2B (mod 2, mod p) X' = 3A + 2(B+C) X' = 3A + 2B + 2C dX = X' - X = 2C

Similarly, 2C is also non-zero for any signal change C which is not zero.

Suppose that the signal change C is made half the value of p
plus the highest bit 2^{deg(p)-1}, so that p will be
activated in the process of determining the output signal and
2C will cancel the lower bits of p. Since p is irreducible
there is no q such that 2q = p. Similar arguments apply for Y
= 2A + 3B.

A present embodiment of the invention uses the degree-128 irreducible 0100004000000400200002000004000001 (hex), and the degree-64 irreducible 010002000000800201 as block mixing polynomials. Polynomials with about half the number of possible terms would be desirable some implementations.

With reference to **FIGURES 9(a)** and **9(b)**, and the
above combiners, a 2n-bit input message signal I is split into
two n-bit data blocks A and B which are input to balanced block
mixer **174**. Within balanced block mixer **174**, data
blocks A and B are each input to combiners **178** and **180**.
Combiner **178** takes the two input data blocks A and B and
produces an output data block X (which is 3A+2B mod 2 mod p).
Combiner **180** takes the same two input blocks A and B and
produces an output data block Y (which is 2A+3B mod 2 mod p).

Upon receipt, data blocks X and Y are input to
inverse balanced block mixer **176** where each data block is
input to combiners **182** and **184**. Combiner **182**
takes its two input data blocks X and Y and produces a data block
3X+2Y mod 2 mod p, i.e., data block A. Combiner **184** takes
its two input data blocks X and Y and produces a data block 2X+3Y
mod 2 mod p, i.e., data block B.

A fenced cryptographic mechanism is a cryptographic
mechanism **100** (**FIGURE 1**) that uses a cipher mechanism
as a component in a cryptographic mechanism which is at least as
powerful and potentially stronger than the cipher mechanism,
yet almost as fast. For example, a fenced DES mechanism uses
the DES cipher mechanism as a component and a fenced IDEA
mechanism uses the IDEA cipher mechanism as a component.

A core component mechanism of a fenced cryptographic mechanism is a substitution mechanism described below.

With reference to **FIGURE 10**, an embodiment of the
present invention being cryptographic mechanism **100** in the
form of a 1x fenced cipher mechanism for an n-bit cipher
mechanism **180** is shown. (The term "fenced" is used to
describe the "picket fence" appearance of the arrays **183a**,
**183b** of substitution mechanisms **182** (S-mechanisms or
S-devices herein) surrounding the cipher mechanism **180** in the
schematic diagrams.) Each of substitution mechanisms **182**
performs an eight-bit wide (256-byte) substitution randomized under
the control of a cryptographic key. (The randomization is
described below.) Therefore, for an n-bit cryptographic
mechanism, two arrays of n/8 substitution mechanisms **182** are
needed, one array on each side of the cipher mechanism **180**,
for a total of n/4 substitution mechanisms **182**. For example,
if the cipher mechanism **180** is a DES mechanism or an IDEA
mechanism (with a 64-bit input block), then one array **183a**,
**183b** of eight (8) substitution mechanisms **182** is used
on either side of the cipher mechanism **180**.

In operation, a fenced cryptographic mechanism **100**
receives an n-bit signal I which is split into n/8 8-bit components,
each of which is passed through a substitution mechanism **182**
in the array **183a** of substitution mechanisms. The combined
output of the array **183a** of substitution mechanisms
**182** is an n-bit data value which is then input to cipher
mechanism **180** for ciphering. The output of cipher mechanism
**180** is an n-bit ciphered message which is then passed (in
8-bit components) through another array **183b** of n/8
substitution mechanisms **182** to produce an n-bit plaintext
signal.

Functionally, the cryptographic mechanism **100** in
**FIGURE 10** can be written as:

for each x_{i}, i = 1 to n/8, S_{i}(x_{i}) -> x'_{i}CIPHER(x'_{1}x'_{2}. . . x'_{n/8}, k) -> y_{1}y_{2}. . . y_{n/8}for each y_{i}, i = 1 to n/8, S_{n/8+i}(y_{i}) -> y'_{i}

The fenced cryptographic mechanisms use, in combination with
other elements, substitution mechanisms. Substitution mechanisms
essentially implement keyed substitution tables which can be
initialized before ciphering, and left unchanged during ciphering
or which can change during ciphering. The substitution mechanisms
operate by transforming one k-bit input value into another k-bit
output value using a substitution table of size 2^{k}. In the
preferred embodiments described herein, the value of k is eight,
and the substitution mechanisms effectively have tables with
2^{8}=256 entries. However, other values of k could be
used.

**FIGURE 11** shows a single substitution mechanism **182**
in detail. Input data I to substitution mechanism **182** is
transformed by the substitution mechanism **182** to output data
O. The input data I is an index into substitution table or array
**184**, and selects one of the entries in the table. The value
stored in the selected entry forms the output data O. A substitution
table S is an indexable n-element vector of output codes or values.
Essentially a substitution table is a memory device which stores
values which can be selected by address or index value.

In cryptographic service, substitution tables are often initialized to contain exactly one occurrence of each possible index value, thus making the transformation they perform invertible. The arrangement of values in the substitution table can be controlled by a cryptographic key. Such a properly-initialized device (or area of memory) is called a substitution mechanism.

For each substitution mechanism, the inverse mechanism can be formed by properly arranging the data in another substitution mechanism. Thus, in substitution mechanism 182, if input I maps to output O, then in the inverse substitution mechanism, input O maps to output I.

The substitution mechanisms perform a one-to-one mapping of input values to output values, so that no two inputs will produce the same output, and no two outputs can derive from the same input.

Throughout this application, including the claims, the term "substitution mechanism" is intended to include all possible implementations (such as electronic hardware, computer implemented software, hybrids, or the like) or the process performed. Thus a substitution mechanism refers to either a substitution process or any implementation (for example, hardware, computer implemented software, hybrid, or the like) for performing the substitution process.

With reference to **FIGURE 10**, if the cipher mechanism
**180** is a DES mechanism, then **FIGURE 10** depicts a
1x fenced DES mechanism in which a 64-bit input message block X is
divided into eight message sub-blocks x_{1} x_{2}
. . . x_{8}, where each x_{i} is an 8-bit sub-block
of X. Each of the blocks x_{i} is transformed via a
corresponding substitution mechanism **182**, S_{i}, to
produce an 8-bit value x'_{i}. The eight 8-bit data values
x'_{i} are input as a 64-bit data block to DES mechanism
**180**, which, using key k, produces a 64-bit output data block
Y made up of 8-bit sub-blocks y_{1} y_{2} . . .
y_{8}. Each of the 8-bit data sub-blocks y_{i} is
transformed by a substitution block to an 8-bit data block
y'_{i}. The output of the mechanism is then formed by
joining or concatenating the eight y'_{i} sub-blocks to
form a 64-bit ciphertext message Y'.

If cipher mechanism **180** is a DES cipher mechanism,
then functionally, the mechanism in **FIGURE 10** can be written
as:

for each x_{i}, i = 1 to 8, S_{i}(x_{i}) -> x'_{i}DES ENCIPHER(x'_{1}x'_{2}. . . x'_{8}, k) -> y_{1}y_{2}. . . y_{8}for each y_{i}, i = 1 to 8, S_{8+i}(y_{i}) -> y'_{i}

Decryption of the message encrypted as above operates by first
inverting what were the last substitutions performed by the
substitution mechanisms **182**, then deciphering the message
with the cipher mechanism **180** (e.g., a DES mechanism), and
then inverting what were the first round of substitutions.

Functionally, decryption of a message encrypted by
the mechanism shown in **FIGURE 10** can be written as:

for each y'_{i}, i = 1 to n/8, S-inverse_{n/8+i}(y'_{i}) -> y_{i}DECIPHER(y_{1}y_{2}. . . y_{n/8}, k) -> x_{1}' x_{2}' . . . x_{n/8}' for each x_{i}', i = 1 to n/8, S-inverse_{i}(x_{i}') -> x_{i}

A 1x Fenced DES is a three-layer structure like triple DES. Since all information flows through DES itself, the overall cipher cannot possibly be weaker than DES. Moreover, the 1x Fenced DES structure is clearly stronger than DES, because a "known plaintext" attack requires knowing the actual input and output values at the DES cipher itself, and these values are hidden by the input and output substitution layers. If even a single input bit to DES is wrong (that is, if even one value in one input substitution is off by even one bit), DES will avalanche and about half of the output bits from DES will be wrong.

The extreme sensitivity of good block ciphers to change -- a tiny change in the input (plaintext) value tends to change about half of the output bits -- is known as avalanche. In the context of an iterated product cipher (such as DES) which has multiple rounds, a small input change will affect a few adjacent bits each round, until, eventually, about half of the output bits change. Avalanche is a requirement in a good block cipher, and is an inherent property of a normal invertible substitution -- any change to the input value of an invertible substitution selects a new and necessarily different output value.

Thus, a single wrong bit has the same statistical effect as a mostly-wrong value; this means that it is not possible to know when the input value is "close," so it does not appear possible to solve the substitutions in an advantageous way.

In some preferred embodiments, as shown, for example, in the
cryptographic mechanism of **FIGURE 12**, arrays of substitution
mechanisms are used to fence multiple parallel block cipher
mechanisms. In the cryptographic mechanism **100** of
**FIGURE 12**, an array **186a** of sixteen 8-bit
substitution mechanisms **182** (each labelled "S" in the
drawing) takes a 128-bit input signal into two 64-bit cipher
mechanisms **188**. The output of the cipher mechanisms
**188** is passed through another array **186b** of sixteen
8-bit substitution mechanisms. Note that half the output of each
substitution mechanism **182** is input to each of the two cipher
mechanisms **188**. In this manner, the output from each
substitution mechanism **182** affects the output of each cipher
mechanism **188**. Thus, for an 8-bit substitution mechanism
**182**, up to eight cipher mechanisms **188** can be used in
this configuration, with one bit of output from each substitution
mechanism **182** going into each of the eight cipher mechanisms
**188**. In general, b-bit substitution mechanisms **182**
can be used with up to b n-bit cipher mechanisms to produce an
b.n-bit cryptographic mechanism **100**.

In order to enhance the strength of a cipher mechanism in a fenced cryptographic mechanism, each substitution mechanism must be shuffled by a cryptographic random number generator before ciphering begins (the random number generator is not needed during actual ciphering). The same random number generator can also generate the cipher keys.

Presently, the most efficient basis for such a random number
generator is the additive random number generator described in
Knuth, *The Art of Programming: Vol 2: Seminumerical Algorithms,*
2d Edition, Addison-Wesley, 1981. This is basically a linear
feedback shift register (LFSR) using addition mod 2^{n}.
The additive random number generator is especially suited to a
fast software implementation. In one version, there are three
pointers into a circular queue, and the random number generator
"step" operation consists of taking two of the pointed-to elements,
adding them, and placing the result in the queue through the third
pointer. The pointers are then incremented within the queue, and
the result returned.

A reasonable structure for the Fenced DES random number generator is thirty-one (31) elements of thirty-two (32) bits each, for a total state of 992 bits. (DES has a 56-bit key.) This state can be initialized easily from a user key using an array of thirty-two (32) degree thirty-one (31) primitive mod-2 polynomials as cyclic redundancy checks (CRC's). This produces thirty-two (32) 31-bit values which can be re-arranged into thirty-one (31) 32-bit values to become the state for the random number generator. This CRC-based initialization allows the user key to be ASCII or binary, of arbitrary length, and thus provides a strong universal key interface.

Since the additive random number generator is essentially a
linear mechanism, it is necessary to nonlinearize the sequence.
One way to do this, mentioned in Ritter, "The Efficient Generation
of Cryptographic Confusion Sequences," *Cryptologia,*
15(2): 81-139, 1991, by dropping out pseudo-random length sections
of the linear sequence, leaving pseudo-random length "take" islands,
and then offsetting each take sequence with a different pseudo-random
offset value. With fairly short "take" islands, this renders the
usual linear attacks worthless, at a cost of dropping some moderate
fraction of the sequence. Additional isolation is provided by the
cheap width of the random number generator, since only eight bits
are needed, but thirty-two bits are calculated. This means that
3/4 of the random number generator state is always hidden, but must
nevertheless be resolved before the random number generator can be
completely exposed.

In some embodiments of cryptographic mechanisms **100**,
combinations of substitution mechanisms and balanced block
mixers are used. These can be used alone or in combination
with other ciphers, and using these mechanisms, the block size
of other ciphers can be increased.

In one embodiment, with a combination of
substitution mechanisms and balanced block mixers, the block
size n of a cipher mechanism can be doubled. (In the
following, only for the sake of generality, it is assumed that
n is an integer power of 2.) With reference to **FIGURE 13**,
a 2x fenced block cryptographic mechanism consists of two keyed
cipher mechanisms **190a**, **190b**, each taking n-bit blocks,
n/2 substitution mechanisms (labelled S1 to Sn/2) in two arrays
**192a**, **192b**, and two balanced block mixers **194a**,
**194b**. A 2n-bit input signal I is split into n/4 8-bit parts
X1, . . ., Xn/4 (as with the other signal splitting described herein,
this splitting can be done merely based on the order of the bits in
the input message, or some complex form of splitting can be used).
Each 8-bit part Xi is passed through an array of substitution
mechanism Si, i=1, . . ., n/4, producing a substituted 8-bit data
part Xi'. The n/4 8-bit data parts Xi' are input to balanced block
mixer **194a** which produces two n-bit data blocks A and B.
Data block A is encrypted by cipher mechanism **190a** to produced
n-bit enciphered block A', and data block B is encrypted by cipher
mechanism **190b** to produced n-bit enciphered block B'. Data
blocks A' and B' are input to a second balanced block mixer
**194b** which produces balanced mixed output blocks. The output
blocks are split into n/4 8-bit parts, each of which is passed
through a substitution mechanism Sj, j=n/4+1, . . ., n/2, to produce
n/4 substituted 8-bit data blocks yj. The n/4 8-bit data blocks yj
are joined to form a 2n-bit ciphertext output signal.

Where the cipher mechanisms **190a** and **190b** are DES
mechanisms, n is 64 bits. In this case, using a combination of
substitution mechanisms and balanced block mixers, the effective
block size of DES can be doubled. With reference to
**FIGURE 13**, two balanced block mixers **194a** and
**194b** are used to produce two 64-bit inputs to DES mechanisms
**190a**, **190b** from a 128-bit input message signal. The
first array **192a** of substitution mechanisms S1-S16 map
sixteen 8-bit inputs to 8-bit outputs, as described above. The
input signal to the system is a 128-bit block, split into sixteen
8-bit blocks, x1-x16. Each of these sixteen blocks is transformed
via a corresponding substitution mechanism. That is, for i from 1
to 16, 8-bit data block xi is transformed by substitution mechanism
Si, to give 8-bit data block x'i. Eight of the transformed 8-bit
blocks are input (as a 64-bit block A) to one DES mechanism
**190a**, and the other eight 8-bit blocks are input (as 64-bit
block B) to the other DES mechanism **190b**. Data blocks A and
B are encrypted to produce to 64-bit output blocks A' and B', where,
functionally, A'=DES(A, k1) and B'=DES(B, k2). As above, keys k1
and k2 can be the same as each other or different. Data blocks A'
and B' are mixed using balanced block mixer **194b**, and the
output 128-bits (as two 64-bit blocks) are then transformed (in
8-bit blocks) by array **192b** of substitution mechanisms
S17-S32 to produce 128-bit output ciphertext message signal
O=y17'y18'...y32'.

A ciphertext message signal O produced by an encryption mechanism
as described is decrypted as follows. The signal is split back into
its 8-bit component parts yi' which are then passed through
substitution mechanisms S-inverse17 to S-inverse32 (which perform
the inverse transformations of S17 to S32, respectively), giving
back the 8-bit data blocks y17 to y32. The 8-bit data blocks y1 to
y17 are passed through the inverse of balanced block mixer 194b,
giving two 64-bit data blocks A' and B'. Data blocks A' and B' are
decrypted by DES mechanisms **190a**, **190b** with the
appropriate keys, giving 64-bit data blocks A and B, respectively.
Data blocks A and B are passed through the inverse of balanced
block mixer **194a** to give outputs Xi' which are transformed by
substitution mechanisms Si-inverse (which performs the inverse
transformation of substitution mechanism Si), for i from 1 to
16. The output of the inverse substitutions forms a 128-bit
plaintext message signal.

In one embodiment of the present invention, with a combination
of substitution blocks and block mixers, the block size of a cipher
is quadrupled. With reference to **FIGURE 14**, a 4x fenced
block cryptographic mechanism consists of four keyed cipher
mechanisms **196a-196d**, each processing n-bit data blocks to
produce an n-bit data block, 2n substitution mechanisms (labelled
S1 to S2n) in two arrays **198a**, **198b**, and six balanced
block mixers **200a-200f**. The corresponding decryption
mechanism uses the inverse substitutions and the inverse mixers,
in the appropriate order.

Where cipher mechanisms **196a-196d** are DES cipher
mechanisms, the encryption illustrated in **FIGURE 14** processes
a 256-bit input message block.

Here there are two arrays **198a**, **198b** of thirty-two
(32) 8-bit input substitution mechanisms S1 to S32 and thirty-two
(32) 8-bit output substitution mechanisms S33 to S64, respectively,
each substitution mechanism effectively a separately-shuffled and
independent substitution table. The overall block size of the
cryptographic mechanism is 256 bits, there are four DES mechanisms,
plus two levels of input mixing (by mixers **200a-200c**) and
two levels of output mixing (by mixers **200d-200f**). Note that
the outermost balanced block mixers join two 128-bit blocks.

This approach spreads the effect of each bit in an input message to each of the four DES mechanisms, and produces a particular output bit from a combination of all four DES mechanisms. This forces an opponent (trying to break the code) to search all four DES keyspaces simultaneously to match a particular known plaintext-ciphertext pair.

While in the above, the substitution mechanisms are effectively 8-bit substitution tables, a different size could be used for these mechanisms.

A software implementation of the 4x DES construct of
**FIGURE 14** performs all sixty-four substitutions and all six
mixings in less time than a single DES computation. Currently, it
ciphers four times the data with about 4.8 times the computation,
and has a keyspace exceeding 224 bits. (A much faster hybrid
implementation might perform the DES computations in hardware.)

In the implementation, table and key initialization take about 182 times the computation of a single 256-bit block ciphering. (This is mainly a consequence of shuffling sixty-four small substitution tables, an initialization which could be separated from the use of new DES mechanism keys. Table initialization can occur less frequently than key initialization.)

The current implementation uses a fast 992-bit additive random number generator and a non-linear "jitterizer" as mentioned in Ritter. In the present implementation, a user key of arbitrary length and content is hashed by thirty-two (32) separate degree-31 primitive mod-2 polynomials with 11-19 non-zero terms, producing the 992-bit random number generator state, which also eventually generates the DES keys. This approach eliminates the need for keys to have a specific format unique to this particular cipher, which enables the selection of an arbitrary cipher from among many different ciphers, all of which can use the exact same key.

Deciphering simply uses inverse substitutions (the inverse of each encipher output substitution is used for decipher input) and DES in decipher mode.

Recall that a change in any one input data bit produces a distribution of changes out of the associated input substitution, depending on the particular substitution, original input value, and change. Any possible byte input has a 50 percent probability of affecting each of the eight output bits from that substitution.

Each substitution mechanism used in the fenced mechanisms
effectively implements a substitution table. A substitution table
S is an indexable n-element vector of output codes. An invertible
substitution table S with inverse table S^{-1} has the
property that for any input code i in n, S^{-1}S(i) = i.
This implies that S contains n different output codes.

An invertible substitution table S contains each output code value exactly once. Since each possible index selects a different element, any index change will select a different output code. Since different code values must differ in at least one bit, any input change must produce a change in at least one output bit.

Given invertible substitution table S with shuffled contents,
the output distribution for any input code change is an arbitrary
selection from the output codes which differ from the current output
code. If the output codes are a complete set of 2^{m} values
0..2^{m-1} for some m, it is likely that about half of the
output bits will change for any input code change.

Conversely, since each output bit is produced by an output code, and the selected output code is completely dependent upon every bit in the input code, each output bit is dependent on every bit of the input. A network with this property is normally called complete, and localized completeness is also the basis for avalanche in an iterated block cipher.

If two 128-bit blocks are first mixed, then the resulting data mixed as two pairs of 64-bit blocks, suppose that there is a change in any one input data bit. Without the mixing, this produces an 8-bit substituted result which would normally affect just a single DES block. But the 128-bit block mixing produces at least one bit change in both 128-bit resulting blocks, and the 64-bit block mixings extend those changes to all four 64-bit resulting blocks, each of which is processed by a separate DES mechanism. Thus, any change of even a single input bit will affect all four DES operations.

Even though the output from each DES operation is pseudo random, a three-level structure is still necessary to prevent various kinds of attacks, for example, so-called "fix-in-the-middle" attacks. Therefore the output substitutions are important. The mixing is also used so that the result from a single DES block cannot be isolated and worked on independently.