A detailed discussion of cryptographic nonlinearity, what it means and how it is computed, with active JavaScript panels to perform the computation.

Nonlinearity is the number of bits bits which must change in the truth table of a Boolean function to reach the closest affine function. If we believe that cryptosystems based on linear or affine functions are inherently weak, the ability to measure nonlinearity is the ability to measure one form of strength.

Nonlinearity measurement is particularly useful to quantify the
strength of invertible
substitution tables.
This is important when pre-defined tables are a part of a
cipher definition.
But nonlinearity measurement can be even *more* important in the
context of
scalable ciphers:
When ciphers can be down to experimental size, it becomes possible to
talk about the overall nonlinearity (for each
key) *of the cipher itself.*
This is far more information than we usually have on cipher designs.

A Boolean function produces a single-bit result for each possible combination of values from perhaps many Boolean variables. The Boolean field consists of the values {0,1}, with XOR as "addition" and AND as "multiplication."

An affine Boolean function has the form:

f = a_{n}x_{n}+ a_{n-1}x_{n-1}+ ... + a_{1}x_{1}+ a_{0}

In the Boolean field, a constant or a_{0} value of '1'
*inverts* or reverses the result, while a constant of '0' has
no effect. The coefficients a_{i} simply *enable* or
*disable* the associated variable x_{i}. And if we
consider the collected coefficients to be a counting binary value,
we have a unique ordering for affine Boolean functions:

Affine Boolean Functions f0 = 0*x[2] + 0*x[1] + 0 = 0 f1 = 0*x[2] + 0*x[1] + 1 = 1 f2 = 0*x[2] + 1*x[1] + 0 = x[1] f3 = 0*x[2] + 1*x[1] + 1 = x[1] + 1 f4 = 1*x[2] + 0*x[1] + 0 = x[2] f5 = 1*x[2] + 0*x[1] + 1 = x[2] + 1 f6 = 1*x[2] + 1*x[1] + 0 = x[2] + x[1] f7 = 1*x[2] + 1*x[1] + 1 = x[2] + x[1] + 1 . . .

In this way, we can write 16 different forms for 3 variables. But it is convenient to pair the functions which are the same except for the value of the constant, and then we have exactly 8 affine Boolean functions of 3 variables. Each of these has a particular value for every possible combination of variable value, which we can show in a truth table:

The 3-Variable Affine Boolean Functions affine truth table 1 1 1 1 1 1 1 1 1 x0 0 1 0 1 0 1 0 1 x1 0 0 1 1 0 0 1 1 x1+x0 0 1 1 0 0 1 1 0 x2 0 0 0 0 1 1 1 1 x2+ x0 0 1 0 1 1 0 1 0 x2+x1 0 0 1 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1

One way to measure a sort of "correlation" between two Boolean functions is to compare their truth tables and count the number of bits which differ; this is their Hamming distance.

Since we *expect* about half the bit positions to differ
(on average), we can subtract that expected distance and come up with
what I am calling -- for lack of a better term -- the "unexpected
distance" (UD). The magnitude of the UD relates to just how
unexpected the distance is, while the sign indicates the direction.
Consider two functions and their difference:

Distance to an Affine Function f 1 0 0 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1 diff 1 1 1 1 0 1 0 1

Here we have a Hamming distance of 6 between the two functions. This is an unexpected distance or UD of 6 - 4 = +2, which means that 2 more bits differ than we would expect.

Another way to compute Boolean correlation is to accumulate the bits of one function (as integers) by addition or subtraction as selected by the other function. For example:

Distance to an Affine Function f 1 0 0 1 1 1 0 0 x2+x1+x0 + - - + - + + - (operation select) accum +1 -0 -0 +1 -1 +1 +0 -0 = +2

This technique yields the UD value directly.

With some work, we can now compare a Boolean function to each possible affine Boolean function, and develop both a distance and an unexpected distance to each:

Unexpected Distance to Affine Boolean Function affine truth table distance ud c 0 0 0 0 0 0 0 0 4 0 x0 0 1 0 1 0 1 0 1 4 0 x1 0 0 1 1 0 0 1 1 6 +2 x1+x0 0 1 1 0 0 1 1 0 6 +2 x2 0 0 0 0 1 1 1 1 4 0 x2+ x0 0 1 0 1 1 0 1 0 4 0 x2+x1 0 0 1 1 1 1 0 0 2 -2 x2+x1+x0 0 1 1 0 1 0 0 1 6 +2 f 1 0 0 1 1 1 0 0

Nonlinearity is the
number of bits which must change in the truth table of a Boolean
function to reach the closest affine function.
But every affine Boolean function also has a *complement* affine
function which has every truth table bit value reversed. This means
that no function possibly can be more than half its length in bits away
from *both* an affine Boolean function *and* its complement.
So a zero UD value is not only what we *expect,* it is in fact
*the best we can possibly do.*

A non-zero UD value is that much closer to some affine function, and that much less nonlinear. So the nonlinearity value is half the length of the function, less the maximum absolute value of the unexpected distance to each affine function.

The function f in the previous section has a length of 8 bits,
and an absolute value maximum unexpected distance of 2. This is a
nonlinearity of *even* (divisible by 2)
if we have a
balanced function.

A Hadamard matrix H is an n x n matrix with all entries +1 or -1, such that all rows are orthogonal and all columns are orthogonal (see, for example, [HED78]).

The usual development (see, for example [SCH87]) starts with a defined 2 x 2 Hadamard matrix H2 which is ((1,1),(1,-1)). Each step consists of multiplying each element in H2 by the previous matrix, thus negating all elements in the bottom-right entry:

Hadamard Matrix Development H2 = | 1 1 | H4 = H2 * H2 = | H2 H2 | | 1 -1 | | H2 -H2 | H4 = | | 1 1 | | 1 1 | | = | 1 1 1 1 | | | 1 -1 | | 1 -1 | | | 1 -1 1 -1 | | | | 1 1 -1 -1 | | | 1 1 | |-1 -1 | | | 1 -1 -1 1 | | | 1 -1 | |-1 1 | | H8 = | H4 H4 | = | 1 1 1 1 1 1 1 1 | | H4 -H4 | | 1 -1 1 -1 1 -1 1 -1 | | 1 1 -1 -1 1 1 -1 -1 | | 1 -1 -1 1 1 -1 -1 1 | | 1 1 1 1 -1 -1 -1 -1 | | 1 -1 1 -1 -1 1 -1 1 | | 1 1 -1 -1 -1 -1 1 1 | | 1 -1 -1 1 -1 1 1 -1 |

Now compare H8 from this strange Hadamard development to the affine functions:

The 3-Variable Affine Boolean Functions c 0 0 0 0 0 0 0 0 x0 0 1 0 1 0 1 0 1 x1 0 0 1 1 0 0 1 1 x1+x0 0 1 1 0 0 1 1 0 x2 0 0 0 0 1 1 1 1 x2+ x0 0 1 0 1 1 0 1 0 x2+x1 0 0 1 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1

So if we map the values in the affine truth table:
*the same functions* as in the Hadamard development.
These are the *Walsh* functions, and here both developments
produce the same order, which is called "natural" or "Hadamard."
Walsh functions have fast transforms which reduce the cost of
correlation computations from n*n to n log n, which can be a very
substantial reduction.

A Fast Walsh Transform (FWT) essentially computes the correlations which we have been calling the "unexpected distance" (UD). It does this by calculating the sum and difference of two elements at a time, in a sequence of particular pairings, each time replacing the elements with the calculated values.

It is easy to do a FWT by hand. (Well, I say "easy," then always struggle when I actually do it.) Let's do the FWT of function f: (1 0 0 1 1 1 0 0): First note that f has a binary power length, as required. Next, each pair of elements is modified by an "in-place butterfly"; that is, the values in each pair produce two results which replace the original pair, wherever they were originally located. The left result will be the two values added; the right result will be the first value less the second. That is,

(a',b') = (a+b, a-b)

So for the values (1,0), we get (1+0, 1-0) which is just (1,1). We start out pairing adjacent elements, then every other element, then every 4th element, and so on until the correct pairing is impossible:

An 8-Element Fast Walsh Transform (FWT) original 1 0 0 1 1 1 0 0 ^---^ ^---^ ^---^ ^---^ first 1 1 1 -1 2 0 0 0 ^-------^ ^-------^ ^-------^ ^-------^ second 2 0 0 2 2 0 2 0 ^---------------^ ^---------------^ ^---------------^ ^---------------^ final 4 0 2 2 0 0 -2 2

Now compare these results to the UD values we found earlier:

Unexpected Distance to the Affine Functions affine ud 1 0 x0 0 x1 +2 x1+x0 +2 x2 0 x2+ x0 0 x2+x1 -2 x2+x1+x0 +2

Note that all FWT elements -- after the zeroth -- map the U.D.
results *exactly* in both magnitude and sign, *and* in
the exact same order. (This ordering means that the binary index
of any result is also the recipe for expressing the affine function
being compared in that position.) The zeroth element in the FWT
is the number of 1-bits in the function when we use the real values
{0,1} to represent the function.

So to find the "unexpected distance" from any balanced function
to every affine Boolean function, just compute the FWT. Clearly,
the *closest* affine function has the absolute value
*maximum* UD value of all the transformed elements past the
zeroth. Just subtract this value from half the function length
(which is the zeroth FWT value in a balanced function) to get the
nonlinearity.

To understand how the FWT works, suppose we label each bit-value with a letter, and then perform a symbolic FWT:

An 8-Element Fast Walsh Transform (FWT) a b c d e f g h ^------^ ^------^ ^------^ ^------^ a+b a-b c+d c-d e+f e-f g+h g-h ^-------------^ ^-------------^ ^-------------^ ^-------------^ a+b a-b a+b a-b e+f e-f e+f e-f c+d c-d -c-d -c+d g+h g-h -g-h -g+h ^---------------------------^ ^---------------------------^ ^---------------------------^ ^---------------------------^ a+b a-b a+b a-b a+b a-b a+b a-b c+d c-d -c-d -c+d c+d c-d -c-d -c+d e+f e-f e+f e-f -e-f -e+f -e-f -e+f g+h g-h -g-h -g+h -g-h -g+h g+h g-h

Each of these columns is the symbolic description of one element in the FWT result. Since each uses the same input variables in the same order, we can represent the uniqueness of each result simply by the sign applied to each variable:

Symbolic FWT Results by Column a+b+c+d+e+f+g+h = + + + + + + + + a-b+c-d+e-f+g-h = + - + - + - + - a+b-c-d+e+f-g-h = + + - - + + - - a-b-c+d+e-f-g+h = + - - + + - - + a+b+c+d-e-f-g-h = + + + + - - - - a-b+c-d-e+f-g+h = + - + - - + - + a+b-c-d-e-f+g+h = + + - - - - + + a-b-c+d-e+f+g-h = + - - + - + + -

Which we can compare to:

The 3-Variable Affine Boolean Functions c 0 0 0 0 0 0 0 0 x0 0 1 0 1 0 1 0 1 x1 0 0 1 1 0 0 1 1 x1+x0 0 1 1 0 0 1 1 0 x2 0 0 0 0 1 1 1 1 x2+ x0 0 1 0 1 1 0 1 0 x2+x1 0 0 1 1 1 1 0 0 x2+x1+x0 0 1 1 0 1 0 0 1

So not only do we once again find the affine functions, we also find them implicit in a way appropriate for computing add / subtract correlations, thus producing UD values directly with high efficiency.

The fast transform by hand is automated in Borland Pascal:

TYPE Lwd = LongInt; LintArray = ARRAY[ 0..16380 ] of LONGINT; PROCEDURE LintHadFmSeqWalsh( VAR DatLintAr; lastel: Lwd ); { Hadamard Walsh from sequential data, in-place } VAR Dat: LintArray ABSOLUTE DatLintAr; a, b: LONGINT; stradwid, { distance between pair of elements } blockstart, block, pair, el1, el2: Lwd; BEGIN stradwid := 1; blockstart := lastel; REPEAT el1 := 0; blockstart := blockstart DIV 2; FOR block := blockstart DOWNTO 0 DO BEGIN el2 := el1 + stradwid; FOR pair := 0 TO PRED(stradwid) DO BEGIN a := Dat[ el1 ]; b := Dat[ el2 ]; Dat[ el1 ] := a + b; Dat[ el2 ] := a - b; Inc( el1 ); Inc( el2 ); END; el1 := el2; END; stradwid := (stradwid + stradwid) AND lastel; UNTIL (stradwid = 0); END; {LintHadFmSeqWalsh}

LintHadFmSeqWalsh takes an array of 32-bit integers, and changes
the array data into the Walsh-Hadamard transform of that data. For
nonlinearity measures, the input data are {0,1} or {1,-1}; the
results are potentially bipolar in either case. (The "lastel"
parameter is the last index in the data array which starts at
index 0; it is thus always 2^{n - 1} for some n. The ABSOLUTE
attribute forces Borland Pascal to treat the parameter as a LongInt
array of arbitrary size.)

It is common to consider a Boolean function as consisting of the
real values {0,1}, but it is *also* common to use the
transformation

x' = (-1)(negative 1 to the power x) where x is {0,1}. This transforms {0,1} into {1,-1}, and for some reason looks much cooler than doing the exact same thing with^{x}

x' = 1 - 2x

This transformation has some implications: Using real values {1,-1} doubles the magnitude and changes the sign of the FWT results, but can simplify nonlinearity for unbalanced functions, because the zeroth term need not be treated specially. But if the Boolean function is balanced, as it will be in the typical invertible substitution table, the zeroth element need not be used at all, so using real values {1,-1} seems to provide no particular benefit in this application.

An invertible
substitution table
is an array of values in which
any particular value can occur at most once. If the range of the
output values is the same as the input values, then every value
occurs in the table exactly once. Typically the table has a
power-of-2 number of elements, which is related to size in bits of
its input (and output) value. For example, an "8-bit" table has
^{8} = 256

Even these relatively small tables have remarkable
keying
potential. Each invertible table differs from every other only in
the arrangement of the values it holds, but there is typically an
incredible number of possible
permutations. A 2-bit
table with ^{2} = 4*factorial*) or just 24 different tables. But a 4-bit
table with ^{4} = 16^{13}

Nonlinearity applies to Boolean functions, and so does not apply
directly to substitution tables. But each output bit from such
a table *can* be considered a Boolean function. So we can
run through the table extracting all the bits in a given bit
position, and then measure the nonlinearity of the function
represented by those bits.

Clearly, if we measure a nonlinearity value for each output bit
position, we do not have a single nonlinearity for the table.
Several ways have been suggested to combine these values, including
the sum or the average of all values. But for cryptographic use it
may be more significant to collect the *minimum* nonlinearity
over all the bit positions. This allows us to argue that no bit
position in the table is weaker than the value we have.
Since a table collects multiple Boolean functions, tables tend to
be weaker than the average Boolean function of the same length.
But the nonlinearity values for tables and sequences of the same
length do tend to be similar and somewhat comparable.

There are no nonlinear 2-bit tables. We know this because there are exactly 6 balanced bit sequences of length 4, and each of those has a measured nonlinearity of zero. So there is no chance to build a nonlinear table by collecting those sequences.

Here are some coarse graphs of nonlinearity distributions at various table sizes:

Nonlinearity Distribution in 4-Bit Tables 0.6 | 0.5 | * * 0.4 | * * 0.3 | * * 0.2 | * * 0.1 | * * 0.0 | * * * Prob +--+--+--+-- 0 2 4 Nonlinearity

Nonlinearity Distribution in 5-Bit Tables 0.7 | * 0.6 | * 0.5 | * 0.4 | * 0.3 | * 0.2 | * * 0.1 | * * * 0.0 | * * * Prob +--+--+--+--+--+--+-- 0 2 4 6 8 10 Nonlinearity

Nonlinearity Distribution in 8-Bit Tables 0.35 | * 0.3 | * 0.25 | * * 0.2 | * * * 0.15 | * * * 0.1 | * * * * 0.05 | * * * * * 0.00 | * * * * * * * Prob +--+--+--+--+--+--+--+--+-- 92 96 100 104 Nonlinearity

Nonlinearity Distribution in 10-Bit Tables 0.2 | 0.175 | * * 0.15 | * * * 0.125 | * * * * 0.1 | * * * * * 0.075 | * * * * * * 0.05 | * * * * * * * * 0.025 | * * * * * * * * * * 0.00 | * * * * * * * * * * * Prob +--+--+--+--+--+--+--+--+--+--+--+--+-- 436 440 444 448 452 456 Nonlinearity

[AY82] Ayoub, F. 1982. Probabilistic completeness of
substitution-permutation encryption networks. *IEE Proceedings,
Part E.* 129(5): 195-199.

[DAE94] Daemen, J., R. Govaerts and J. Vandewalle. 1994.
Correlation Matrices. *Fast Software Encryption.* 275-285.

[FOR88] Forre, R. 1988. The Strict Avalanche Criterion:
Spectral Properties of Boolean Functions and an Extended
Definition. *Advances in Cryptology -- CRYPTO '88.* 450-468.

[GOR82] Gordon, J. and H. Retkin. 1982. Are Big S-Boxes Best?
Cryptography. *Proceedings of the Workshop on Cryptography,
Burg Feuerstein, Germany, March 29-April 2, 1982.* 257-262.

[HED78] Hedayat, A. and W. Wallis. 1978. Hadamard Matrices and
their Applications. *The Annals of Statistics.* 6(6): 1184-1238.

[HEY94] Heys, H. and S. Tavares. 1994. On the security of the
CAST encryption algorithm. *Canadian Conference on Electrical and
Computer Engineering.* Halifax, Nova Scotia, Canada, Sept. 1994.
332-335.

[HEY95] Heys, H. and S. Tavares. 1995. Known plaintext
cryptanalysis of tree-structured block ciphers. *Electronics
Letters.* 31(10): 784-785.

[MEI89] Meier, W. and O. Staffelbach. 1989. Nonlinearity Criteria
for Cryptographic Functions. *Advances in Cryptology --
Eurocrypt '89.* 549-562.

[MIR97] Mirza, F. 1997. Linear and S-Box Pairs Cryptanalysis of the Data Encryption Standard.

[OC91] O'Connor, L. 1991. Enumerating nondegenerate permutations.
*Advances in Cryptology -- Eurocrypt '91.* 368-377.

[OC93] O'Connor, L. 1993. On the Distribution Characteristics in
Bijective Mappings. *Advances in Cryptology -- EUROCRYPT '93.*
360-370.

[PIE88] Pieprzyk, J. and G. Finkelstein. 1988. Towards effective
nonlinear cryptosystem design. *IEE Proceedings, Part E.*
135(6): 325-335.

[PIE89] Pieprzyk, J. and G. Finkelstein. 1989. Permutations
that Maximize Non-Linearity and Their Cryptographic Significance.
*Computer Security in the Age of Information.* 63-74.

[PIE89B] Pieprzyk, J. 1989. Non-linearity of Exponent Permutations.
*Advances in Cryptology -- EUROCRYPT '89.* 80-92.

[PIE93] Pieprzyk, J., C. Charnes and J. Seberry. 1993. Linear
Approximation Versus Nonlinearity. *Proceedings of the Workshop on
Selected Areas in Cryptography (SAC '94).* 82-89.

[PRE90] Preneel, B., W. Van Leekwijck, L. Van Linden, R. Govaerts
and J. Vandewalle. 1990. Propagation Characteristics of Boolean
Functions. *Advances in Cryptology -- Eurocrypt '90.* 161-173.

[RUE86] Rueppel, R. 1986. *Analysis and Design of Stream
Ciphers.* Springer-Verlag.

[SCH86] Schroeder, M. 1986. *Number Theory in Science and
Communications.* Springer-Verlag.

[SCH87] Schroeder, M. and N. Sloane. 1987. New Permutation Codes
Using Hadamard Unscrambling. *IEEE Transactions on Information
Theory.* IT-33(1): 144-145.

[XIO88] Xiao, G-Z. and J. Massey. 1988. A Spectral Characterization
of Correlation-Immune Combining Functions. *IEEE Transactions on
Information Theory.* 34(3): 569-571.

[YOU95] Youssef, A. and S. Tavares. 1995. Resistance of Balanced
S-boxes to Linear and Differential Cryptanalysis. *Information
Processing Letters.* 56: 249-252.

[YOU95B] Youssef, A. and S. Tavares. 1995. Number of Nonlinear
Regular S-boxes. *Electronics Letters.* 31(19): 1643-1644.

[ZHA95] Zhang, X. and Y. Zheng. 1995. GAC -- the Criterion for
Global Avalanche Characteristics of Cryptographic Functions.
*Journal for Universal Computer Science.* 1(5): 316-333.

Other nonlinearity articles, often dealing with measurements on the new block ciphers I have been developing, are available in the Technical Articles section of my pages:

http://www.ciphersbyritter.com/index.html#TechnicalArticles

Also, many Walsh-Hadamard references are available in my Walsh-Hadamard literature review:

http://www.ciphersbyritter.com/RES/WALHAD.HTM

*Last updated:*1998-05-21