Path: illuminati.io.com!uunet!cs.utexas.edu!not-for-mail
From: ritter@io.com (Terry Ritter)
Newsgroups: sci.crypt

Subject: Re: hardware 2-oscillator RNGs
Date: 4 Aug 1994 16:19:41 -0500
Organization: UTexas Mail-to-News Gateway
Lines: 109
Sender: nobody@cs.utexas.edu
Message-ID: <199408042119.QAA03780@pentagon.io.com>
NNTP-Posting-Host: news.cs.utexas.edu

 In <31qi26$4js@lyra.csx.cam.ac.uk> rja14@cl.cam.ac.uk (Ross Anderson)
 writes:


>> Statistical tests show good results.  For long files (>100K) a slight
>> predominance of "1" bits begins to affect the results.  Probably a result
>> of some sampling artifact, but I haven't tracked it down.  Anyway,
>> XOR-ing successive bytes together eliminates this.
>
>It is also useful to check the diversity of the generator. If you take a
>number of 32-bit samples, then you should start finding collisions (values
>sampled twice) after you have drawn about D = 2^16 samples, values sampled thre
e
>times after about T = 2^22 samples (actually 3N^{2/3}), and so on.

 Essentially what we are doing here is establishing a relationship
 between population and the number of "matches" found in trials
 of random samples.  But if we just sample until we start to find
 matches, the situation is more ambiguous than it needs to be.

 For any particular population, the number of doubles in a trial
 of a particular size follows a distribution which is Poisson-like.
 Thus, any single trial can have an extremely wide range of results.
 Clearly, a single test of this sort is not particularly useful;
 realistic tests must involve many trials.  But if the various
 trials take different numbers of samples, it is difficult to see
 the fixed relationship which is actually present.

 I call the usual match an "exact" or "binomial" double in my
 recent article [2]; these are predicted by the binomial formula
 (although Kullback's early work should be recognized [1]).  If we
 assume a particular population, the binomial formula can be used
 directly to give a probability of any particular number of exact
 doubles in a trial of any given size.  But the binomial equations
 are fairly complex and essentially one-way; it is not clear how
 the equations could be manipulated to produce a unique population
 value from the matches found in a trial of a given size.  Root-
 finding procedures can find multiple plausible populations which
 each satisfy the equation.

 But by changing our definition of "doubles," we can find a simple
 exact equation which we can use to predict population from the
 average number of "exact" doubles.  The desirable definition is
 the combinations of exact doubles taken two at a time, plus the
 combinations of exact triples taken two at a time, etc., a sum
 which I call "augmented" doubles.  Equation (2.3) [2:162] for
 Augmented repetitions (Ar) was printed incorrectly, and should be:

                 n     i
      Ar(k,r) = SUM [ ( ) ri ]
                i=k    k

 where k is the augmented level (doubles = 2) and ri the number of
 exact repetitions at level i.  An exact double implies an augmented
 double, but an exact triple implies three augmented doubles: the
 combinations of three things taken two at a time.

 Then we can predict the population (N) directly from the sample
 size (s), and the experimental augmented doubles (ad) value:

      Nad(s,ad) = s(s-1)/2ad

 For large numbers of trials and an average resulting ad, if the
 predicted population is not similar to the anticipated population,
 we have problems.  We can then run different experiments to try
 and isolate the problem, and then manipulate the experimental
 device to try and solve the problem.


>The point is that, quite apart from sampling artefacts, circuits can have all
>sorts of strange resonances; many of these are not immediately visible to the
>naked eye. Even transient resonances (which may have been part of the Intel
>problem) can make it much more likely than you hoped that your generator will
>produce two keys which are the same or which agree in a large number of bits.

 Yes.  Really-random generators must be experimentally tested, and
 they must be tested for population.

 Some problems are really not exposed by other tests.  For example,
 Chi-Square tests generally "bin" results and this will hide some
 problems.  Consider the situation of a population far smaller than
 it should be, but with values so evenly disbursed that several
 values make it into each Chi-Square bin.

 It is also possible that the local population could be inversely
 related to the probability of value occurrence.  This could lead to
 good average and Chi-Square statistics, despite a greatly-reduced
 population.  Since we usually depend on a large, homogenous
 population for security, such a result would be a problem.

 Another possible use for this technology is to test small "toy"
 versions of ciphers, to see if a particular cipher design has a
 tendency to produce fewer ciphertexts than one might expect from
 the anticipated population of keys.  It is not always possible
 to know that a keyspace is really as large as it appears to be.


 References

 [1]   Kullback, S.  1976.  (first published 1938).  Statistical
       Methods in Cryptanalysis.  Aegean Park Press, P.O. Box 2837,
       Laguna Hills, CA  92654-0837  USA.

 [2]   Ritter, T.  1994.  Estimating Population from Repetitions in
       Accumulated Random Samples.  Cryptologia.  18(2): 155-190.

 ---
 Terry Ritter   ritter@io.com