Randomness Tests: A Literature Survey


Research Comments from Ciphers By Ritter


Terry Ritter


I take a "random" value to be an arbitrary selection from among some defined set. I take a "random" string to be just a sequence of random values. Among other things, this means that a long sequence of zeros (or any other particular sequence) is just unlikely, not non-random. Unless, of course, such a sequence regularly turns up more often than one would expect.

When we are talk about "nonrandom" strings, we discuss the distribution of encountered strings. And, even if we have a perfectly random source, we cannot expect to see a "flat" distribution of actual random strings: Since we effectively sample with replacement, we must expect to find many duplicates before getting even one example of every possible string.

What we would like is some test which will indicate whether some accumulation of sequences is "random." But the best that any test can offer is the probability of getting such an accumulation under whatever assumptions we have. That is, there is always a valid possibility, no matter how small, that even very peculiar accumulations of strings could have occurred by chance. And this means that it is very important to run our tests multiple times.

When a "random" sequence is taken to be "without any shorter construction" we have a problem: Because it is impossible to check every possible construction, it is also impossible to say that a sequence could not be constructed in a shorter way (unless we restrict ourselves to some particular construction). But this is a fine analogy to cryptography itself: Because it is impossible to try every attack, it is also impossible to state that there is not some simpler attack which might work.


Contents


1963 -- von Neumann

von Neumann, J. 1963. Various Techniques Used in Connection With Random Digits. John mon Neumann, Collected Works. A.H. Taub, ed., MacMillan.

". . . in tossing a coin it is probably easier to make two consecutive tosses independent than to toss heads with probability exactly one-half. If independence of successive tosses is assumed, we can reconstruct a 50-50 chance out of even a badly biased coin by tossing twice. If we get heads-heads or tails-tails, we reject the tosses and try again. If we get heads-tails (or tails-heads), we accept the result as heads (or tails)."

"Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin. For, as has been pointed out several times, there is no such thing as a random number -- there are only methods to produce random numbers, and a strict arithmetic procedure of course is not such a method."

"If one nevertheless considers certain arithmetic methods in detail, it is quickly found that the critical thing about them is the behavior of round-off errors in mathematics."


1965 -- Kolmogorov

Kolmogorov, A. 1965. Three Approaches to the Quantitative Definition of Information. Problems of Information Transmission. 1: 1-17.

"There are two common approaches to the quantitative definition of 'information': combinatorial and probabilistic. The author briefly describes the major features of these approaches and introduces a new algorithmic approach that uses the theory of recursive functions."


1971 -- Kak

Kak, S. 1971. Classification of Random Binary Sequences Using Walsh-Fourier Analysis. Proceedings of Applications of Walsh Functions. 74-77. Washington, D.C., 1971.

"This paper presents a straightforward procedure using Walsh functions to determine the pattern in a binary sequence."

". . . classification of data amounts to the computation of structure with respect to some criterion."

"A sequence shall be said to have no pattern or be random if the number of independent amplitudes in the Wash-Fourier transform is equal to the length of the sequence itself, i.e., 2k."

"The measure of randomness r(s) shall be defined by"

r(s) = no. of independent amplitudes of W(s) / length of the sequence = i(s) / L(s)

"The number of independent amplitudes of W(s) shall equal the number of its component Walsh waves." [ The number of non-zero terms? ]


1972 -- Phillips

Phillips, J. 1972. Algorithm AS 48. Uncertainty Function for a Binary Sequence. Applied Statistics. 21: 97-99.

"The purpose of the algorithm is to input . . . a sequence of symbols which are of two kinds . . . , to tabulate the frequency for all subsequences of all lengths up to a maximum limit maxlength, and compute, for each length m, the uncertainty function . . . ."

"The algorithm works by setting up a table of frequencies of all subsequences of the maximum length, each possible subsequence being treated as an implicit binary integer of maxlength digits . . . ." "Then repeatedly, the leftmost bit of the current integer is deleted, the integer is left-shifted one place, the next symbol input, coded as 0 or 1, added to the integer, and the frequency of the pattern represented by the result increased by 1." "The computation of the uncertainty functions from the tables is trivial."


1972 -- Phillips

Phillips, J. 1972. Algorithm AS 49. Autocorrelation Function for a Binary Sequence. Applied Statistics. 21: 100-103.

"The purpose of the algorithm is to input . . . a sequence of symbols, which are of two kinds . . . , to tabulate the frequencies with which symbols of each kink are followed at a given lag by symbols of either kind, this for all lags up to a given maximum maxlag, and to compute the [phi] correlation coefficient for each lag m . . . ."


1975 -- Chaitin

Chaitin, G. 1975. Randomness and Mathematical Proof. Scientific American. 232(5): 47-52.

"Tossing a coin 20 times can produce any one of 220 (or a little more than a million) binary series, and each of them has exactly the same probability. Thus it should be no more surprising to obtain the series with an obvious pattern than to obtain one that seems to be random . . . . If origin in a probabilistic event were made the sole criterion of randomness, then both series would have to be considered random, and indeed so would all others . . . . The conclusion is singularly unhelpful in distinguishing the random from the orderly."

". . . 'incompressibility' is a property of all random numbers; indeed, we can proceed directly to define randomness in terms of incompressibility: A series of numbers is random if the smallest algorithm capable of specifying it to a computer has about the same number of bits of information as the series itself."


1976 -- Bennett

Bennett, W. 1976. Scientific and Engineering Problem-Solving with The Computer. Prentice-Hall.

Interesting approaches to Language correlations (Chapter 4) and Random Processes (Chapter 6).


1977 -- Yuen

Yuen, C. 1977. Testing Random Number Generators by Walsh Transform. IEEE Transactions on Computers. C-26(4): 329-333.

"Abstract -- A truly random sequence of numbers has an asymptotically flat Walsh power spectrum. This fact is used to devise a new test for the randomness of the output of random number generators."

"One essential randomness test is that of uncorrelatedness, i.e., that the autocorrelation of the sequence is approximately a d-function."

"A property equivalent to uncorrelatedness is that the power spectrum be flat."

"In this paper we propose another randomness test equivalent to the correlation test: that the Walsh power spectrum be flat." "Thus, testing the flatness of the spectrum is equivalent to testing for uncorrelatedness of the values of x."

". . . the band spectrum estimate can also be evaluated by spectrum averaging . . . ."

"With segment averaging there is no longer any difficulty with core requirements. When we wish to test a sequence of 2n values, we would read in, or generate 2n-m values at a time, compute the 2n-m-point fast Walsh transform of the segment, square, and add the squares to the 2n-m memory locations which have been initially set to zero. After all 2m segments have been processed these memory locations will contain the band spectrum estimate, and we can then proceed to examine if it is consistent with a flat S."

"Another possible additional test is that we permute the random numbers in some way before Walsh transformation. Given a truly random sequence, we should still get a flat spectrum regardless of what permutation was tried."


1980 -- Atkinson

Atkinson, A. 1980. Tests of Pseudo-random Numbers. Applied Statistics. 29(2): 164-171.

"Knuth (1969, Sections 3.3.2 and 3.3.3) divides test of the supposedly independent and uniform Ui into two classes: empirical tests, in which a sample is taken and assessed without consideration of the way in which the pseudo-random numbers are generated, and which do not require a sample. Two theoretical tests are the lattice test (Marsaglia, 1972) and the spectral test described by Knuth (1969, Section 3.3.4)."

"The chief purpose of the present paper is to stress that the lattice test, like the spectral test, is only appropriate for some mixed congruential generators, and to describe how it should be modified for multiplicative generators."


1981 -- Knuth

Knuth, D. 1981. (1st ed. 1969.) The Art of Computer Programming. Volume 2: Seminumerical Algorithms. Addison-Wesley.

The most-quoted reference, and still a required background, but now somewhat dated.


1983 -- Hopkins

Hopkins, T. 1983. Algorithm AS 193. A Revised Algorithm for the Spectral Test. Applied Statistics. 32: 328-335.

"An efficient implementation of an improved algorithm for performing the spectral test (Knuth, 1981) on linear congruential random number generators is presented." "The new algorithm is faster than and does not have the termination difficulties . . . of its predecessor."


1985 -- Marsaglia

Marsaglia, G. 1985. A Current View of Random Number Generators. Computer Science and Statistics: The Interface. 3-10. Elsevier Science.

"The ability to generate satisfactory sequences of random numbers is one of the key links between Computer Science and Statistics. Standard methods may no longer be suitable for increasingly sophisticated uses, such as in precision Monte Carlo studies, testing for primes, combinatorics or public encryption number generators: congruential, shift-register and lagged-Fibonacci, give poor results, and describes new methods that pass the stringent tests and seem more suitable for precision Monte Carlo use."

"This theoretical result suggests that if two RNG's produce sequences x1, x2, x3, ... and y1, y2, y3, ... on some finite set S on which we have a binary operation [dot], then the combination generator, producing x1 [dot] y1, x2 [dot] y2, x3 [dot] y3, ... should be better, or at least no worse than, either of the component RNG's." "The binary operation need only have the property that its operation table forms a latin square; it need be neither commutative nor associative."

"A few tests were suggested in the early days of making tables of random digits [5], and M. D. MacLaren and I suggested a few more [8]. These, and a few others, have become a de facto standard set of tests, enumerated in Knuth's V2, [6]. Knuth's books are such marvels that they sometimes discourage initiative -- so well done that many readers take them as gospel, the definitive word on the particular subject treated. And so they are, most of the time. But not, I think, for testing random number generators."


1985 -- Beker and Piper

Beker, H. and F. Piper. 1985. Secure Speech Communications. Academic Press.

Chapter 3: The Principles of Cryptography, Part IV: Stream Ciphers, Section A: Randomness (p. 104-109). Describes:

  1. The Frequency Test.
  2. The Serial Test.
  3. The Poker Test.
  4. The Autocorrelation Test.
  5. The Runs Test.


1985 -- Blumer et. al.

Blumer, A., et. al. 1985. The Smallest Automaton Recognizing the Subwords of a Text. Theoretical Computer Science. 40: 31-55. North-Holland.

"In the classic string matching problem for text, we are given a text w and a pattern string x and we want to know if x appears in w, i.e., if x is a subword of w. Standard approaches to this problem involve various methods for preprocessing x so that the text w can be searched rapidly [1, 9, 16]. Since each search still takes time proportional to the length of w, this method is inappropriate when many different patterns are examined against a fixed text . . . . In this case, it is desirable to preprocess the text itself, building an auxiliary data structure that allows one to determine whether x is a subword of w in time proportional to the length of x, not w."


1987 -- Rueppel

Rueppel, R. 1987. Linear Complexity and Random Sequences. Advances in Cryptology -- CRYPTO '86. 167-188. Springer-Verlag.

"Abstract: The problem of characterizing the randomness of finite sequences arises in cryptographic applications. The idea of randomness clearly reflects the difficulty of predicting the next digit of a sequences from all previous ones. The approach taken in this paper is to measure the (linear) unpredictability of a sequence (finite or periodic) by the length of the shortest linear feedback shift register (LFSR) that is able to generate the given sequence. This length is often referred to in the literature as the linear complexity of the sequence. It is shown that the expected linear complexity of a sequence of n independent and uniformly distributed binary random variables is close to n/2 . . . ." "Consequently, we expect a 'typical' random sequence to have associated a 'typical' linear complexity profile closely following the n/2 line."

". . . a good random sequence generator should have linear complexity close to the period length, and also a linear complexity profile which follows closely, but 'irregularly', the n/2-line (where n denotes the number of sequence digits) thereby exhibiting average step lengths and step heights of 4 and 2, respectively."


1987 -- Tezuka

Tezuka, S. 1987. On the Discrepancy of GFSR Pseudorandom Numbers. Journal of the Association for Computing Machinery. 34(4): 939-949.

"Abstract. A new summation formula based on the orthogonal property of Walsh functions is devised. Using this formula, the k-dimensional discrepancy of the generalized feedback shift register (GFSR) random numbers is derived. The relation between the discrepancy and k-distribution of GFSR sequences is also obtained. Finally the definition of optimal GFSR pseudorandom number generators is introduced."


Blumer, A. et. al.

Blumer, A. et. al. 1987. Complete Inverted Files for Efficient Text Retrieval and Analysis. Journal of the Association for Computing Machinery. 34(3): 578-595.

"Abstract. Given a finite set of texts S = {w1, ..., wk} over some fixed finite alphabet [sigma], a complete inverted file for S is an abstract data type that provides the functions find(w), which returns the longest prefix of w that occurs (as a subword of a word) in S; freq(w), which returns the number of times w occurs in S; and locations(w), which returns the set of positions where w occurs in S."


1988 -- Feldman

Feldman, F. 1987. Fast Spectral Tests for Measuring Nonrandomness and the DES. Advances in Cryptology -- CRYPTO '87. 243-254. Springer-Verlag.

"Abstract -- Two spectral tests for detecting nonrandomness were proposed in 1977. One test, developed by J. Gait [1], considered properties of power spectra obtained from the discrete Fourier transform of finite binary strings."

"Another test, developed by C. Yuen [2], considered analogous properties for the Walsh transform. In estimating variance of spectral bands, Yuen assumes the spectral components to be independent. Except for the special case of Gaussian random numbers, this assumption introduces a significant error into his estimate."

"A new test, based on an evaluation of the Walsh spectrum, is presented here. This test extends the earlier test of C. Yuen."

"We prove that our measure of the Walsh spectrum is equivalent to a measure of the skirts of the logical autocorrelation function. It is clear that an analogous relationship exists between Fourier periodograms and the circular autocorrelation function."


1988 -- Wanders

Wanders, H. 1988. On the Significance of Golomb's Randomness Postulates in Cryptography. Philips Journal of Research. 43(2): 185-222.

"Abstract. Golomb's famous book on shift register sequences is a classical reference in the study of stream ciphers. His so-called 'postulates' about PN-sequences are to be generalized and relaxed in real cryptographic applications."


1989 -- Maurer and Massey

Maurer, U. and J. Massey. 1989. Perfect Local Randomness in Pseudo-random Sequences. Advances in Cryptology -- CRYPTO '89. 100-112. Springer-Verlag.

"Abstract. The concept of provable cryptographic security for pseudo-random number generators that was introduced by Schnorr is investigated and extended. The cryptanalyst is assumed to have infinite computational resources and hence the security of the generators does not rely on any unproved hypothesis about the difficulty of solving a certain problem, but rather relies on the assumption that the number of bits of the generated sequence the enemy can access is limited."

"The results of Section 2 show that provably-secure ciphers can be constructed under the restriction that the number of plaintext bits obtainable by the enemy is smaller than the length of the key, divided by the logarithm of the plaintext length."

"In Section 2, we introduce the concept of a perfect local randomizer, i.e., of a sequence generator that stretches a (binary) random sequence of length k to a pseudo-random sequence of length n such that every subset of e or less bits of the sequence is a set of independent random bits. The concept of a perfect local randomizer corresponds to what is known in combinatorics as an orthogonal array."


1989 -- Richards

Richards, T. 1989. Graphical Representation of Pseudorandom Sequences. Computers and Graphics. 13(2): 261-262.

"The technique reviewed in this note provides a simple way of producing two-dimensional graphics from one-dimensional data and also for revealing subtle patterns in noisy data."

"Variations on this technique could be used to examine any sequence of values that appears to be random. If a pattern emerges, the underlying cause may then be investigated."


1989 -- Carroll

Carroll, J. 1989. The binary derivative test: noise filter, crypto aid, and random-number seed selector. Simulation. 53(3): 129-135.

"Random noise obscuring digitalized images or text can be removed by a new technique that recognizes the appearance of randomness in short strings. This test makes use of binary derivatives."

"The binary derivative of a string of bits is formed by XORING the members of each successive overlapping pair of bits. One can continue to differentiate the string, losing one bit each time, until only one bit is left."


1989 -- Beth and Dai

Beth, T. and Z-D. Dai. 1989. On the Complexity of Pseudo-Random Sequences -- or: If You Can Describe a Sequence It Can't be Random. Advances in Cryptology -- EUROCRYPT '89. 533-543. Springer-Verlag.

"We shall prove in this note that Turing-Kolmogorov-Chaitin complexity and the Linear Complexity are the same for practically all 0-1-sequences of length n, already for moderate n."


1989 -- Jansen

Jansen, C. 1989. The Shortest Feedback Shift Register That Can Generate A Given Sequence. Advances in Cryptology -- CRYPTO '89. 90-99. Springer-Verlag.

"Abstract. In this paper the problem of finding the absolutely shortest (possibly nonlinear) feedback shift register, which can generate a given sequence with characters from some arbitrary finite alphabet, is considered. To this end, a new complexity measure is defined, called the maximum order complexity. A new theory of the nonlinear feedback shift register is developed, concerning elementary complexity properties of transposed and reciprocal sequences, and feedback functions of the maximum order feedback shift register equivalent. Moreover, Blumer's algorithm is identified as a powerful tool for determining the maximum order complexity profile of sequences, as well as their period, in linear time and memory. The typical behaviour of the maximum order complexity profile is shown and the consequences for the analysis of given sequences and the synthesis of feedback shift registers are discussed."


1990 -- Feldman

Feldman, F. 1990. A New Spectral Test for Nonrandomness and the DES. IEEE Transactions on Software Engineering. 16(3): 261-267.

"Abstract -- A new test for detecting the nonrandomness of finite binary strings is proposed. This test, based on an evaluation of the power spectrum of a finite string, extends and quantifies a similar test proposed by Jason Gait [1] in 1977."

"As noted by Gait [1], a good cipher must have the characteristics of a good pseudorandom bit generator. Since most tests of nonrandomness focus on the time domain values of a test string, Gait pointed to the need of also testing frequency domain values. He presented a graphic approach for displaying the power spectrum of binary strings."

"As an empirical measure of the sensitivity of our test, it was compared with a chi-square test for uniformity of distribution, which also measures nonrandomness." "It is also apparent that our spectral tests are sensitive to a different kind of nonrandomness than the chi-square test."


1990 -- Maurer

Maurer, U. 1990. A Universal Statistical Test for Random Bit Generators. Advances in Cryptology -- CRYPTO '90. 409-420. Springer-Verlag.

"Abstract. A new statistical test for random bit generators is presented that is universal in the sense that any significant deviation of the output statistics from the statistics of a perfect random bit generator is detected with high probability . . . ." "This is in contrast to most presently used statistical tests which can detect only one type of non-randomness . . . ." "Moreover, the new test . . . measures the entropy per output bit of a generator." "The test is easy to implement and very fast."


1990 -- Jansen and Boekee

Jansen, C. and D. Boekee. 1990. On the Significance of the Directed Acyclic Word Graph in Cryptology. Advances in Cryptology: AUSCRYPT '90. 318-326. Springer-Verlag.

"Summary. Blumer's algorithm can be used to build a Directed Acyclic Word Graph (DAWG) in linear time and memory from a given sequence of characters. In this paper we introduce the DAWG and show that Blumer's algorithm can be used very effectively to determine the maximum order (or nonlinear) complexity profile of a given sequence." "It also appears that the DAWG is an even more efficient means of generating the sequence, given a number of characters, then e.g. the nonlinear feedback shift register equivalent of the sequence, as it always needs the least amount of characters to generate the remainder of the sequence."

"In [Jans 89] a new complexity measure for sequences is proposed, called Maximal Order Complexity, which is equal to the length of the shortest (possibly nonlinear) feedback shift register that can generate a given sequence. Analogous to Rueppel's linear complexity profile [Ruep 84], the maximum order complexity profile is proposed as a measure of 'goodness' for sequences, i.e. a measure which shows how well a given sequence resembles a real random sequence."


1990 -- Jansen and Boekee

Jansen, C. and D. Boekee. 1990. A Binary Sequence Generator Based on Ziv-Lempel Source Coding. Advances in Cryptology -- AUSCRYPT '90. 156-164. Springer-Verlag.

"Summary. A new binary sequence generator is proposed, which is based on Ziv-Lempel source coding. In particular, the Ziv-Lempel decoding algorithm is applied to codewords generated by linear feedback shift registers. It is shown that the sequences generated in this way have high linear complexity and good statistical properties.


1991 -- Compagner

Compagner, A. 1991. The Hierarchy of Correlations in Random Binary Sequences. Journal of Statistical Physics. 63(5/6): 883-896.

"The meaning of randomness is studied for the simple case of binary sequences. Ensemble theory is used, together with correlation coefficients of any order. Conservation laws for the total amount of correlation are obtained. They imply that true randomness is an ensemble property and can never be achieved in a single sequence."


1991 -- Mund

Mund, S. 1991. Ziv-Lempel Complexity for Periodic Sequences and its Cryptographic Application. Advances in Cryptology -- EUROCRYPT '91. 114-126. Springer-Verlag.

"In the last couple of years several different complexity measures were used to examine pseudorandom number sequences in cryptography. Examples for such complexity measures are the linear complexity which is defined in Rueppel [Ruep86] or the maximal-order complexity which was introduced by Jansen [Jans89]." "Another complexity measure for sequences was defined by Ziv and Lempel in 1976 [Lemp76]. This complexity measure is a measure of the rate at which new patterns emerge as we move along the sequence."


1991 -- Compagner

Compagner, A. 1991. Definitions of Randomness. American Journal of Physics. 59(8): 700-705.

"All numbers are arbitrary, but some are more arbitrary than others. Below one million, 170769 appears to be more random than 888888, though a priori they are equally probable."

"When randomness is not understood, it is more difficult to achieve than law and order." ". . . the subject suffers from an excess of mathematical ingenuity which has made straightforward ideas on randomness obsolete before they ever were formulated properly."

"[The new] ideas are based on ensemble theory and give rise to new definitions." ". . . when ensembles are used for the foundations of probability theory, randomness has to be identified with uncorrelatedness, a neglected notion that yet solves many puzzles surrounding randomness."


1992 -- L'Ecuyer

L'Ecuyer, P. 1992. Testing Random Number Generators. Proceedings of the 1992 Winter Simulation Conference. 305-313.

"ABSTRACT. So-called Random number generators on computers are deterministic functions producing a sequence of numbers which should mimic a sample of i.i.d. [individually independently distributed] U(0,1) random variables. Two classes of tests are commonly applied to such generators. Firstly, the theoretical tests, which look at the intrinsic structure of the generator to derive behavioral properties of the sequence of points . . . ." "Secondly, the empirical goodness-of-fit tests, which try finding statistical evidence against the null hypothesis: 'the sequence is a sample from i.i.d. U(0,1) random variables'. In this paper, we survey the main techniques from both classes, discuss their philosophy, and look at some of the most recent developments . . . ."


1993 -- Marsaglia and Zaman

Marsaglia, G. and A. Zaman. 1993. Monkey Tests for Random Number Generators. Computers and Mathematics with Applications. 26(9): 1-10.

"Few images invoke the mysteries and ultimate uncertainties of a sequence of random events as well as that of the proverbial monkey at a typewriter." "Technically, we are concerned with overlapping m-tuples of successive elements in a random sequence."

"This article describes some very simple, as well as some quite sophisticated, tests that shed light on the suitability of certain random number generators."


1993 -- Maclaren

Maclaren, N. 1993. Cryptographic Pseudo-random Numbers in Simulation. Fast Software Encryption. Ross Anderson, ed., 185-190.

"A fruitful source of confusion on the Internet is that both cryptologists and statisticians use pseudo-random numbers, but that their objectives and constraints are subtly different."

"It is important to note that there is no consensus on when a pseudo-random number generator can be regarded as adequate, both because the theory is very incomplete and because so many different fields are involved."


1995 -- Gustafson, Dawson and Golic

Gustafson, H, E. Dawson, and J. Golic. 1995. Randomness Measures Related to Subset Occurrence. Cryptography: Policy and Algorithms. 132-143. Springer-Verlag.

"Abstract Statistical tests have been applied to measures obtained from partitioning the keystream of a stream cipher into subsets of a given length. Similarly, the strength of a block cipher has been measured by applying statistical tests to subsets obtained from both the input and output blocks. There are problems in applying these tests as the size of the subsets increases. We propose a novel method based on the classical occupancy problem to deal with larger subsets in testing for randomness in a keystream in the case of a stream cipher and for independence between subsets of input and output blocks in the case of a block cipher."


Terry Ritter, his current address, and his top page.

Last updated: 2002-09-02