Nonlinearity is the number of bits which must change in the truth
table of a Boolean function to reach the closest affine function.
Although we cannot measure the nonlinearity of a cipher of practical
size, we *can* measure modest substitution tables and small block
constructions. Since a random substitution table is the ideal model
of a block cipher, we can measure many random tables and develop
an ideal nonlinearity distribution for a small block size. We can
then measure small block constructions and compare distributions.
Experimental results show that Mixing constructions can produce
nonlinearity distributions which are surprisingly close to the ideal.
These experiments tend to contradict the claim that block ciphers
which use linear mixing are inherently weak.

From: ritter@io.com (Terry Ritter) Newsgroups: sci.crypt Subject: Measured Nonlinearity in Mixing Constructions (LONG!) Date: Mon, 22 Dec 1997 08:18:14 GMT Lines: 586 Message-ID: <349e21f4.2151248@news.io.com> MEASURED NONLINEARITY IN MIXING CONSTRUCTIONS Terry Ritter Ritter Software Engineering http://www.io.com/~ritter/ 1997-12-21 Abstract Nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. Although we cannot measure the nonlinearity of a cipher of practical size, wecanmeasure modest substitution tables and small block constructions. Since a random substitution table is the ideal model of a block cipher, we can measure many random tables and develop an ideal nonlinearity distribution for a small block size. We can then measure small block constructions and compare distributions. Experimental results show that Mixing constructions can produce nonlinearity distributions which are surprisingly close to the ideal. These experiments tend to contradict the claim that block ciphers which use linear mixing are inherently weak. Introduction The ideal block cipher is a keyed simple substitution table of sufficient size. Unfortunately, with 128-bit blocks, there would be 2**128 entries in that table, which is completely out of the question. So the modern block cipher is a *construction* intended to *simulate* a keyed substitution of the desired size. At issue is the effectiveness of the construction technology. One way to investigate this is by using Boolean function theory, since a substitution table can be considered a set of independent Boolean functions, one for each output bit. A Boolean function produces a single-bit result for each possible combination of values from perhaps many Boolean variables. The *nonlinearity* of a Boolean function is the Hamming distance to the closest affine function [e.g., PIE88, PIE89, PIE89B]. That is, nonlinearity is the number of bits which must change in the truth table of a Boolean function to reach the closest affine function. If "linearity" is considered a significant cryptographic weakness, nonlinearity is an explicit measure of the *lack* of that weakness. That is, nonlinearity measures one form of cipher "strength." For cryptographic purposes, it is desired to take the nonlinearity of a substitution table to be the *minimum* of the nonlinearity values for each output bit in that table. Nonlinearity is measured by forming the (one-bit-wide) truth table for a particular output bit, then performing a Fast Walsh-Hadamard Transform (FWT) on that array. Each result value is essentially a correlation count to a particular affine function, and the minimum distance is found by scanning the transform results. In measuring nonlinearity, it is generally necessary to record the function result for each possible combination of input variables. If we have an 8-bit function, we must record and then transform 256 elements, and if we have a 16-bit function, we must record and transform 64K elements. Measuring large functions rapidly becomes impossible. Although we cannot hope to measure the nonlinearity of a real 64-bit or 128-bit block cipher, we *can* measure nonlinearity in substitution tables and small block constructions. Of course, this is only reasonable for *scalable* constructions which can build both real ciphers and measurable tiny versions from the exact same plans. Ideal Nonlinearity Distributions By properly shuffling tables using a large-state cryptographic RNG (in particular: http://www.io.com/~ritter/KEYSHUF.HTM ) we can sample a presumably uniform distribution of different table arrangements. We can measure the nonlinearity of each of these tables, and accumulate the results. For 4-bit tables, we get something like this: Nonlinearity Distribution in 4-Bit Tables Fig. 1 0.6 | 0.5 | * * 0.4 | * * 0.3 | * * 0.2 | * * 0.1 | * * 0.0 | * * * Prob +--+--+--+-- 0 2 4 Nonlinearity First, note that nonlinearity always takes even values. Next, 4-bit invertible substitutions are surprisingly weak: Despite having 16 bits of Boolean function description, no 4-bit table was found to be more than 4 bits away from an affine function, and almost half were only 2 bits away. A few of these tables (probability 0.01) even contain a true affine function. The 8-bit tables have the opportunity to be far more complex, since there are 256 bits in each 8-bit Boolean function: Nonlinearity Distribution in 8-Bit Tables Fig. 2 0.35 | * 0.3 | * 0.25 | * * 0.2 | * * * 0.15 | * * * 0.1 | * * * * 0.05 | * * * * * 0.00 | * * * * * * * Prob +--+--+--+--+--+--+--+--+-- 92 96 100 104 Nonlinearity Some of the literature asserts that *random* S-Boxes will have various good qualities provided the tables are "large enough" [AY82, GOR82, OC91, OC93, YOU95B]. The obvious question is: "How large is 'large enough'?" Experimental measurement of the nonlinearity of 1,000,000 random 8-bit tables shows exactly one table with a nonlinearity as low as 78; this is a 0.000001 probability (1 count in 1E6). So not only are random 8-bit tables with even a single affine coordinate (nonlinearity = 0) *unlikely*, they are *so* unlikely as to be almost impossible to find. (Indeed, [GOR82] gives the probability as 10**-72.) So accidental "linearity" in random 8-bit tables seems to be a non-issue. Our block cipher construction problem essentially involves using small random tables (here 4-bit tables with 16 entries each and a mean nonlinearity of 3.0) to somehow emulate a table which is at least twice as wide (here an 8-bit table of 256 entries and a mean nonlinearity of 99.0). Stated in this way, a solution seems quite unlikely, for how can just 4, 6, or 8 tables of nonlinearity 3 ever produce a nonlinearity of 99? (We also examine using 5-bit tables with 32 entries and a mean nonlinearity of 7.8 emulating a 10-bit table with 1024 entries and a mean nonlinearity of 447.7, which seems similarly improbable.) Mixing Ciphers A mixing cipher (see, for example: http://www.io.com/~ritter/CRYPHTML.HTM#MixTech especially: http://www.io.com/~ritter/MIXCORE.HTM and: http://www.io.com/~ritter/EXTRMSPD.HTM ) is distinguished by its use of high quality mixing, as opposed to repeated rounds of lesser mixing. Mixing ciphers generally use multiple substitution tables, each of which is shuffled for primary keying. The mixing itself is generally linear, but assures that each and every input affects each and every output in a balanced way. Presumably, nonlinearity measurements will have something to say about possible ill effects of this linear mixing. With the particular mixing function which I call "Balanced Block Mixing," (see, for example: http://www.io.com/~ritter/CRYPHTML.HTM#BBMTech and especially: http://www.io.com/~ritter/BBM.HTM ), a change in even one input is *guaranteed* to change every output, which is the basis for good overall diffusion and for reasoning about such diffusion. Mixing ciphers thus satisfy "the important cryptographic property of completeness" [HEY95]. This mixing is linear, and, by itself, has no strength at all. But the purpose of the mixing is to combine two input values into two output values, *each* of which depends upon *both* input values. The issue here is whether this sort of structure can produce higher nonlinearity values than the small substitution tables which are the only nonlinear components. Obviously, we would like to see the result approach the ideal nonlinearity distribution for the larger block. Fig. 3 shows the basic mixing construction: The input block enters at the top, is split into two and each half substituted. Those results are mixed into two new block values, which are then substituted again. In this way the larger block is emulated by operations on half-width blocks. Each additional mixing means a new layer of mixing and a new layer of substitution. A Single Mixing Fig. 3 | | SUB SUB \ / MIX / \ SUB SUB | | With a single mixing (and four random 4-bit tables) we find nonlinearity values only every 4 steps, instead of every 2: Nonlinearity Distribution with One Mixing Level Fig. 4 0.45 | * 0.4 | * 0.35 | * 0.3 | * 0.25 | * * 0.2 | * * 0.15 | * * * * 0.1 | * * * * 0.05 | * * * * 0.00 | * * * * * * * * Prob +--+--+--+--+--+--+--+--+--+--+--+--+--+-- 56 64 72 80 88 96 100 Nonlinearity This is a poor approximation to the ideal 8-bit distribution. But with two mixings (and six random 4-bit tables) we can do much better (this represents one run of 10,000 tiny ciphers): Nonlinearity Distribution with Two Mixing Levels Fig. 5 0.35 | >* 0.3 | * 0.25 | >* * 0.2 | * * >* 0.15 | * * * * 0.1 | >* * * * 0.05 | >* * * * * 0.00 | * >* * * * * * >* Prob +->+--+--+--+--+--+--+--+-- 92 96 100 104 Nonlinearity Here the ">" symbols represent the ideal distribution. The goal here seems to be not necessarily having the exact ideal distribution, but instead to gain confidence that the construction does produce reasonable *approximation* to the ideal. Indeed, one would expect that a random table must *inevitably* be qualitatively different from any possible construction which has a smaller total state. But Fig. 5 does seem to be a reasonable approximation, especially in view of Fig. 4. Alas, chi-square comparisons are more exacting: Chi-Square For Two 4-Bit Mixings Fig. 6 NL Expect Found Chi-Sq 106 2 1 } 104 249 242 } 0.255 102 2027 1847 16.984 100 3412 3256 7.132 98 2474 2520 0.855 96 1172 1306 15.321 94 450 530 14.222 92 150 182 6.827 90 46 65 7.848 88 13 29 } 60.500 86 4 11 } 84 1 7 } 82 0 3 } 80 0 1 } This is obviously an unreasonable chi-square value. But if we add just one more mixing, for a total of 3 mixings and eight random 4-bit tables, we can do much better: Chi-Square For Three 4-Bit Mixings Fig. 7 NL Expect Found Chi-Sq 106 2 2 } 104 249 256 } 0.195 102 2027 1985 0.870 100 3412 3386 0.198 98 2474 2518 0.855 96 1172 1179 0.042 94 450 446 0.036 92 150 175 4.167 90 46 35 2.630 88 13 17 } 0.000 86 4 1 } 84 1 0 } ------ 8.933 With 9 "degrees of freedom," this is a *reasonable* chi-square value, making the trial an arguable statistical match to the ideal distribution. So if an accurate distribution is in fact needed at this level, using the very weak 4-bit table components and only two tables across, three mixing levels would be preferred. But this brings up two further questions: First, might two mixings be enough with 4-bit tables if we have 4 tables across? And, might two mixings be enough if we use 5-bit tables? We investigate the last question first, because it involves far less computation, and if the answer is positive, there is no practical need to investigate the first: Any serious cipher would use tables wider than 5-bits anyway. First, we have the typical nonlinearity of 5-bit tables: Nonlinearity Distribution in 5-Bit Tables Fig. 8 0.7 | * 0.6 | * 0.5 | * 0.4 | * 0.3 | * 0.2 | * * 0.1 | * * * 0.0 | * * * Prob +--+--+--+--+--+--+-- 0 2 4 6 8 10 Nonlinearity Then the approximate ideal distribution for 10-bit tables: Nonlinearity Distribution in 10-Bit Tables Fig. 9 0.2 | 0.175 | * * 0.15 | * * * 0.125 | * * * * 0.1 | * * * * * 0.075 | * * * * * * 0.05 | * * * * * * * * 0.025 | * * * * * * * * * * 0.00 | * * * * * * * * * * * Prob +--+--+--+--+--+--+--+--+--+--+--+--+-- 436 440 444 448 452 456 Nonlinearity While this gives a general idea of the distribution, we now prefer to use the actual counts, scaled to the expectations of a smaller trial. The ideal model is thus constructed from three different 15-hour trials of 1,000,000 random 10-bit tables each, with the counts reduced by a factor of 100 and rounded to an integer. The Ideal 10-Bit Nonlinearity Distribution Fig. 10 NL Trial 1 Trial 2 Trial 3 Ideal 462 0 1 0 0 460 55 75 55 1 458 2145 2269 2234 22 456 20537 20938 20882 208 454 76259 76344 75980 763 452 148847 148135 148510 1485 450 188329 187802 187585 1878 448 177775 178208 178616 1782 446 140283 140219 140407 1403 444 97415 97317 97465 974 442 61999 62282 62213 622 440 37829 37685 37382 376 438 21675 21789 21753 218 436 12417 12320 12408 124 434 6826 6818 6912 68 432 3740 3693 3584 37 430 1912 2050 1884 19 428 979 1043 1073 10 426 490 537 526 5 424 249 241 265 2 422 119 133 143 1 420 64 55 69 0 418 26 25 30 0 416 12 14 12 0 414 9 5 6 0 412 3 2 4 0 410 0 0 1 0 Next, we have nonlinearity measurements for 1,000 different 10-bit ciphers: Chi-Square For Two 5-Bit Mixings Fig. 11 NL Expect Found Chi-Sq 458 2 3 } 456 21 13 } 2.333 454 76 74 0.053 452 149 139 0.667 450 188 202 1.043 448 178 198 2.247 446 140 125 1.607 444 97 95 0.041 442 62 62 0.000 440 38 36 0.105 438 22 31 3.682 436 12 10 0.333 434 7 6 } 0.286 432 4 4 } 430 2 0 } 428 1 2 } ------ 12.397 With 11 "degrees of freedom," this is a believable chi-square value. So we proceed with a 10,000-cipher trial: Chi-Square For Two 5-Bit Mixings Fig. 12 NL Expect Found Chi-Sq 460 1 1 } 458 22 20 } 0.043 456 208 190 1.558 454 763 722 2.203 452 1485 1487 0.003 450 1878 1863 0.120 448 1782 1780 0.002 446 1403 1453 1.782 444 974 961 0.174 442 622 618 0.026 440 376 378 0.011 438 218 243 2.867 436 124 137 1.363 434 68 63 0.368 432 37 45 1.730 430 19 17 0.211 428 10 8 } 0.474 426 5 10 } 424 2 3 } 422 1 0 } 420 1 1 } ------ 12.935 With 15 "degrees of freedom," this is also a reasonable value. So 26 more 10,000-cipher trials were performed, with the following chi-square results: Chi-Square Values for 10,000-Cipher Trials Fig. 13 6.768 7.984 8.054 9.007 11.247 12.225 12.464 12.583 12.901 14.293 15.170 15.876 16.458 16.848 17.842 19.691 19.973 20.375 20.571 21.956 23.108 24.753 24.773 25.709 28.069 33.837 The last value seems unreasonably high; either we got very unlucky, or the data are telling us that the match is not perfect. Perhaps we are seeing imperfections in the 5-bit tables the way we earlier saw problems in the 4-bit tables. But most of the values are very reasonable, and values like this do not repeatedly occur by chance alone. Conclusions These initial experiments support an assertion that a two-level Mixing cipher using tables of reasonable size can have a nonlinearity distribution unexpectedly similar to the ideal block cipher. The match seems sufficiently precise as to imply the existence of an underlying mathematical relationship which is currently unknown. We conclude that Mixing constructions do in fact produce nonlinearity levels and distributions similar to those of an ideal cipher, despite using much smaller and much weaker components. These experiments thus support the Mixing approach to block cipher design, and do *not* support the claim that ciphers with linear mixing are inherently weak. Note that the ability to perform these experiments is based on the "scalability" of the cipher design. This is an important advantage that very few designs offer: Scalability supports far deeper experimental analysis than is possible in a full-size cipher. The approach to block cipher analysis taken in this article seems to have not been reported previously in the open literature. References and Bibliography [AY82] Ayoub, F. 1982. Probabilistic completeness of substitution-permutation encryption networks. IEE Proceedings, Part E. 129(5): 195-199. [DAE94] Daemen, J., R. Govaerts and J. Vandewalle. 1994. Correlation Matrices. Fast Software Encryption. 275-285. [FOR88] Forre, R. 1988. The Strict Avalanche Criterion: Spectral Properties of Boolean Functions and an Extended Definition. Advances in Cryptology -- CRYPTO '88. 450-468. [GOR82] Gordon, J. and H. Retkin. 1982. Are Big S-Boxes Best? Cryptography. Proceedings of the Workshop on Cryptography, Burg Feuerstein, Germany, March 29-April 2, 1982. 257-262. [HEY94] Heys, H. and S. Tavares. 1994. On the security of the CAST encryption algorithm. Canadian Conference on Electrical and Computer Engineering. Halifax, Nova Scotia, Canada, Sept. 1994. 332-335. [HEY95] Heys, H. and S. Tavares. 1995. Known plaintext cryptanalysis of tree-structured block ciphers. Electronics Letters. 31(10): 784-785. [MEI89] Meier, W. and O. Staffelbach. 1989. Nonlinearity Criteria for Cryptographic Functions. Advances in Cryptology -- Eurocrypt '89. 549-562. [MIR97] Mirza, F. 1997. Linear and S-Box Pairs Cryptanalysis of the Data Encryption Standard. [OC91] O'Connor, L. 1991. Enumerating nondegenerate permutations. Advances in Cryptology -- Eurocrypt '91. 368-377. [OC93] O'Connor, L. 1993. On the Distribution Characteristics in Bijective Mappings. Advances in Cryptology -- EUROCRYPT '93. 360-370. [PIE88] Pieprzyk, J. and G. Finkelstein. 1988. Towards effective nonlinear cryptosystem design. IEE Proceedings, Part E. 135(6): 325-335. [PIE89] Pieprzyk, J. and G. Finkelstein. 1989. Permutations that Maximize Non-Linearity and Their Cryptographic Significance. Computer Security in the Age of Information. 63-74. [PIE89B] Pieprzyk, J. 1989. Non-linearity of Exponent Permutations. Advances in Cryptology -- EUROCRYPT '89. 80-92. [PIE93] Pieprzyk, J., C. Charnes and J. Seberry. 1993. Linear Approximation Versus Nonlinearity. Proceedings of the Workshop on Selected Areas in Cryptography (SAC '94). 82-89. [PRE90] Preneel, B., W. Van Leekwijck, L. Van Linden, R. Govaerts and J. Vandewalle. 1990. Propagation Characteristics of Boolean Functions. Advances in Cryptology -- Eurocrypt '90. 161-173. [RUE86] Rueppel, R. 1986. Analysis and Design of Stream Ciphers. Springer-Verlag. [XIO88] Xiao, G-Z. and J. Massey. 1988. A Spectral Characterization of Correlation-Immune Combining Functions. IEEE Transactions on Information Theory. 34(3): 569-571. [YOU95] Youssef, A. and S. Tavares. 1995. Resistance of Balanced S-boxes to Linear and Differential Cryptanalysis. Information Processing Letters. 56: 249-252. [YOU95B] Youssef, A. and S. Tavares. 1995. Number of Nonlinear Regular S-boxes. Electronics Letters. 31(19): 1643-1644. [ZHA95] Zhang, X. and Y. Zheng. 1995. GAC -- the Criterion for Global Avalanche Characteristics of Cryptographic Functions. Journal for Universal Computer Science. 1(5): 316-333. --- Terry Ritter ritter@io.com http://www.io.com/~ritter/ The Crypto Glossary: http://www.io.com/~ritter/GLOSSARY.HTM

* Terry Ritter, his
current address, and his
top page. *

*Last updated:* 1997-12-28