Path: illuminati.io.com!uunet!cs.utexas.edu!not-for-mail From: ritter@io.com (Terry Ritter) Newsgroups: sci.crypt Subject: Re: hardware 2-oscillator RNGs Date: 4 Aug 1994 16:19:41 -0500 Organization: UTexas Mail-to-News Gateway Lines: 109 Sender: nobody@cs.utexas.edu Message-ID: <199408042119.QAA03780@pentagon.io.com> NNTP-Posting-Host: news.cs.utexas.edu In <31qi26$4js@lyra.csx.cam.ac.uk> rja14@cl.cam.ac.uk (Ross Anderson) writes: >> Statistical tests show good results. For long files (>100K) a slight >> predominance of "1" bits begins to affect the results. Probably a result >> of some sampling artifact, but I haven't tracked it down. Anyway, >> XOR-ing successive bytes together eliminates this. > >It is also useful to check the diversity of the generator. If you take a >number of 32-bit samples, then you should start finding collisions (values >sampled twice) after you have drawn about D = 2^16 samples, values sampled thre e >times after about T = 2^22 samples (actually 3N^{2/3}), and so on. Essentially what we are doing here is establishing a relationship between population and the number of "matches" found in trials of random samples. But if we just sample until we start to find matches, the situation is more ambiguous than it needs to be. For any particular population, the number of doubles in a trial of a particular size follows a distribution which is Poisson-like. Thus, any single trial can have an extremely wide range of results. Clearly, a single test of this sort is not particularly useful; realistic tests must involve many trials. But if the various trials take different numbers of samples, it is difficult to see the fixed relationship which is actually present. I call the usual match an "exact" or "binomial" double in my recent article [2]; these are predicted by the binomial formula (although Kullback's early work should be recognized [1]). If we assume a particular population, the binomial formula can be used directly to give a probability of any particular number of exact doubles in a trial of any given size. But the binomial equations are fairly complex and essentially one-way; it is not clear how the equations could be manipulated to produce a unique population value from the matches found in a trial of a given size. Root- finding procedures can find multiple plausible populations which each satisfy the equation. But by changing our definition of "doubles," we can find a simple exact equation which we can use to predict population from the average number of "exact" doubles. The desirable definition is the combinations of exact doubles taken two at a time, plus the combinations of exact triples taken two at a time, etc., a sum which I call "augmented" doubles. Equation (2.3) [2:162] for Augmented repetitions (Ar) was printed incorrectly, and should be: n i Ar(k,r) = SUM [ ( ) ri ] i=k k where k is the augmented level (doubles = 2) and ri the number of exact repetitions at level i. An exact double implies an augmented double, but an exact triple implies three augmented doubles: the combinations of three things taken two at a time. Then we can predict the population (N) directly from the sample size (s), and the experimental augmented doubles (ad) value: Nad(s,ad) = s(s-1)/2ad For large numbers of trials and an average resulting ad, if the predicted population is not similar to the anticipated population, we have problems. We can then run different experiments to try and isolate the problem, and then manipulate the experimental device to try and solve the problem. >The point is that, quite apart from sampling artefacts, circuits can have all >sorts of strange resonances; many of these are not immediately visible to the >naked eye. Even transient resonances (which may have been part of the Intel >problem) can make it much more likely than you hoped that your generator will >produce two keys which are the same or which agree in a large number of bits. Yes. Really-random generators must be experimentally tested, and they must be tested for population. Some problems are really not exposed by other tests. For example, Chi-Square tests generally "bin" results and this will hide some problems. Consider the situation of a population far smaller than it should be, but with values so evenly disbursed that several values make it into each Chi-Square bin. It is also possible that the local population could be inversely related to the probability of value occurrence. This could lead to good average and Chi-Square statistics, despite a greatly-reduced population. Since we usually depend on a large, homogenous population for security, such a result would be a problem. Another possible use for this technology is to test small "toy" versions of ciphers, to see if a particular cipher design has a tendency to produce fewer ciphertexts than one might expect from the anticipated population of keys. It is not always possible to know that a keyspace is really as large as it appears to be. References [1] Kullback, S. 1976. (first published 1938). Statistical Methods in Cryptanalysis. Aegean Park Press, P.O. Box 2837, Laguna Hills, CA 92654-0837 USA. [2] Ritter, T. 1994. Estimating Population from Repetitions in Accumulated Random Samples. Cryptologia. 18(2): 155-190. --- Terry Ritter ritter@io.com