```Path: illuminati.io.com!uunet!cs.utexas.edu!not-for-mail
From: ritter@io.com (Terry Ritter)
Newsgroups: sci.crypt

Subject: Re: hardware 2-oscillator RNGs
Date: 4 Aug 1994 16:19:41 -0500
Organization: UTexas Mail-to-News Gateway
Lines: 109
Sender: nobody@cs.utexas.edu
Message-ID: <199408042119.QAA03780@pentagon.io.com>
NNTP-Posting-Host: news.cs.utexas.edu

In <31qi26\$4js@lyra.csx.cam.ac.uk> rja14@cl.cam.ac.uk (Ross Anderson)
writes:

>> Statistical tests show good results.  For long files (>100K) a slight
>> predominance of "1" bits begins to affect the results.  Probably a result
>> of some sampling artifact, but I haven't tracked it down.  Anyway,
>> XOR-ing successive bytes together eliminates this.
>
>It is also useful to check the diversity of the generator. If you take a
>number of 32-bit samples, then you should start finding collisions (values
>sampled twice) after you have drawn about D = 2^16 samples, values sampled thre
e
>times after about T = 2^22 samples (actually 3N^{2/3}), and so on.

Essentially what we are doing here is establishing a relationship
between population and the number of "matches" found in trials
of random samples.  But if we just sample until we start to find
matches, the situation is more ambiguous than it needs to be.

For any particular population, the number of doubles in a trial
of a particular size follows a distribution which is Poisson-like.
Thus, any single trial can have an extremely wide range of results.
Clearly, a single test of this sort is not particularly useful;
realistic tests must involve many trials.  But if the various
trials take different numbers of samples, it is difficult to see
the fixed relationship which is actually present.

I call the usual match an "exact" or "binomial" double in my
recent article [2]; these are predicted by the binomial formula
(although Kullback's early work should be recognized [1]).  If we
assume a particular population, the binomial formula can be used
directly to give a probability of any particular number of exact
doubles in a trial of any given size.  But the binomial equations
are fairly complex and essentially one-way; it is not clear how
the equations could be manipulated to produce a unique population
value from the matches found in a trial of a given size.  Root-
finding procedures can find multiple plausible populations which
each satisfy the equation.

But by changing our definition of "doubles," we can find a simple
exact equation which we can use to predict population from the
average number of "exact" doubles.  The desirable definition is
the combinations of exact doubles taken two at a time, plus the
combinations of exact triples taken two at a time, etc., a sum
which I call "augmented" doubles.  Equation (2.3) [2:162] for
Augmented repetitions (Ar) was printed incorrectly, and should be:

n     i
Ar(k,r) = SUM [ ( ) ri ]
i=k    k

where k is the augmented level (doubles = 2) and ri the number of
exact repetitions at level i.  An exact double implies an augmented
double, but an exact triple implies three augmented doubles: the
combinations of three things taken two at a time.

Then we can predict the population (N) directly from the sample
size (s), and the experimental augmented doubles (ad) value:

Nad(s,ad) = s(s-1)/2ad

For large numbers of trials and an average resulting ad, if the
predicted population is not similar to the anticipated population,
we have problems.  We can then run different experiments to try
and isolate the problem, and then manipulate the experimental
device to try and solve the problem.

>The point is that, quite apart from sampling artefacts, circuits can have all
>sorts of strange resonances; many of these are not immediately visible to the
>naked eye. Even transient resonances (which may have been part of the Intel
>problem) can make it much more likely than you hoped that your generator will
>produce two keys which are the same or which agree in a large number of bits.

Yes.  Really-random generators must be experimentally tested, and
they must be tested for population.

Some problems are really not exposed by other tests.  For example,
Chi-Square tests generally "bin" results and this will hide some
problems.  Consider the situation of a population far smaller than
it should be, but with values so evenly disbursed that several
values make it into each Chi-Square bin.

It is also possible that the local population could be inversely
related to the probability of value occurrence.  This could lead to
good average and Chi-Square statistics, despite a greatly-reduced
population.  Since we usually depend on a large, homogenous
population for security, such a result would be a problem.

Another possible use for this technology is to test small "toy"
versions of ciphers, to see if a particular cipher design has a
tendency to produce fewer ciphertexts than one might expect from
the anticipated population of keys.  It is not always possible
to know that a keyspace is really as large as it appears to be.

References

[1]   Kullback, S.  1976.  (first published 1938).  Statistical
Methods in Cryptanalysis.  Aegean Park Press, P.O. Box 2837,
Laguna Hills, CA  92654-0837  USA.

[2]   Ritter, T.  1994.  Estimating Population from Repetitions in
Accumulated Random Samples.  Cryptologia.  18(2): 155-190.

---
Terry Ritter   ritter@io.com

```