Newsgroups: sci.crypt
From: (Colin Plumb)

Subject: Re: IBM-PC random generator, source included
Message-ID: <>
Organization: University of Toronto
References: <> <>
+           <> <1992Jun25.071702.
+ >
Date: 26 Jun 92 11:51:17 GMT
Lines: 62

In article <> (Terry Ritter) w rites:
> It should be obvious that if we have a well-designed 32-bit
> RNG and use it to produce "a few hundred thousand" consecutive
> 32-bit internal state results, there will be no duplicates at
> all.  A normal statistical RNG steps through (almost) its entire
> state without repetition.  But we may not agree that we have
> direct access to the entire state in Nico's design.

Unfortunately, it's false.  You can expect a collision at around
the 64K mark.  Here's the proof: given n random numbers x_1..x_n
in some range 0..N-1, there are no duplicates if and only if each 
of the n(n+1)/2 pairs is distinct.  Thus, it is equivalent to generating
n(n+1) random numbers and checking that each even-odd pair is distinct.
If n = 2^16, n(n+1) is just over 2^32, and if N is 2^32, the expected
number of duplicates is just over 1.

You can turn it around and say that if you want any duplicates, the
number of samples you need is proportional to the square root of the
number of possible values.  This is called the "birthday attack" on
authentication schemes.  Let's say that we are doing a secure coin-toss
protocol.  I pick heads or tails, add a lot of random junk to that
1 bit, push it through a one-way function, and send the result to
you.  You then, say, send me a guess as to which way the toss went, and
I reveal the result, by revealing the input to the function.

If the function is one-to-one, then it's theoretically possible for you
to figure out what the value I sent you was, and make that guess,
defeating the protocol.  If it's many-to-one, then you can't figure it
out, bu it's theoretically possible that I know two inputs, one heads
and one tails, which give the same output, so I can lie to you,
defeating the protocol.  There is no perfectly secure solution, but let
us assume we are using the second one.

Now, if I send b bits of signature to you, you think it would take
me on the order of 2^b work to find a heads input that will produce
the same output as a tails input.  If you have a fixed tals input,
that is true, but if you're allowed to generate lots of heads inputs
and lots of tail inputs and use any match, it takes 2^(b/2+1) work,
on average, to generate 2^(b/2) of each type of input and search for
a match among the outputs.

This is known as the "birthday attack" on authentication protocols and
it is why 64 bits is an absolute minimum size on authenticators where
being able to send almost any false message would be bad.
(If you're in a bank network, you need a message that includes an
account number you control, so finding any message won't do.)

> Let's do this, and compare the results.  Now, from "a few hundred
> thousand samples," can we distinguish between, say, 16, 24, 32
> or 40 bits (or more) of internal state in different RNG's?

Let's say 2^18 samples.  If the generator has fewer than 36 bits
of internal state, we can expect a few collisions, assuming infinitely
large output values.  (I.e. no collisions due to the limited range of the
outputs.)  The assertion was that the RNG given has significantly
less than 32 bits of internal state.  If so, we'd expect, from
2^18 samples, 2^4 = 16 collisions due to the 32-bit size of the output
and approximately 2^(36-n) collisions due to there only being n bits
of internal state.  If there are, say, 28 bits of internal state,
we expect 256.  The difference between 16 and 256 is quite detectable.