Subject: Re: What the hell did Santha-Vazirani paper actually say?
Summary: algorithm doesn't work, this one does
From: (david lewis @ EECG, University of Toronto)
I have no significant crypto experience, but I believe that
the enclosed is correct.

I have read these articles with interest, since some of us
devloped a noise box that produced nearly random bits (something
like probability of a one is 51%). I developed an algorithm that
is optimal and provably generates random output for any input
of nearly random bits, with non-randomness that can be described.
By optimal, I mean that it produces a bit of output for every bit
of entropy in its input.

The algorithm posted is not only inefficient, it is wrong.
It does not produce random bits. Specifically, if p is the probabilty
of seeing a one from each of the n noise sources, then the probability
that there are an even number of ones (and thus a zero as the result
of xoring them all is (in maple format)

 pb:=proc(p,n) sum(p^(2*i)*((1-p)^(n-2*i))*n!/((2*i)!*(n-2*i)!),i=0..n/2) end;

which simplifies to
                              1/2 (1 - 2 p)  + 1/2

For example, 3 noise sources with prob of a 1 of .4 have a .504 probability
of generating a 0. Almost random, but not quite.

Xoring the result of this with a square wave is irrelevant; it may
remove bias, but by making one bit have a .504 probability of one,
followed by a .504 probability of zero.

The algorithm is also inefficient because the p=.4 noise sources have entropy
of .971 bit per baud, so a total of 2.91 bits of entropy are consumed
to produce a random bit.

The solution I ultimately devised is completely general and efficient.
It takes a source of bits with some randomness which can be modelled
by pone(prevbits), giving the probability of seeing a one as the next bit
given any number of previous bits. The pone function can be arbitrarily
complicated; as long as it exactly models the non-randomness in the
input, the output will be completely random. In practice, a table listing
pone(prevbits) for some finite number of previous bits is sufficient.

An explanation of the algorithm follows.

Suppose we have a real number x, x member [0,1),
and the probability distribution of x is completely uniform. Then
it is trivial to generate any number of random bits by inspecting the binary
representation of x.

The next step is to realise that a sequence of nearly random bits can
be used to generate some extra bits. Given some nearly
random sequence of bits r[i], i = 1..n, with probabilty of a one pone[i],
define z[0] = x, and

	 z[i] = (1 - pone [i]) + z[i-1] * pone [i]	if r[i] is 1
	 z[i] = z[i-1] * (1 - pone [i])			if r[i] is 0

As long as x is random, and pone[i] exactly models the probabilty
of a one in r[i], then z[i] is randomly distributed in [0,1).

The problem, of course, is that we don't have a perfectly random number x
to start with. So lets represent the unknown x by the interval [0,1), and
perform the same calculation. Then each z[i] is an interval [zmin[i],zmax[i]),
and the same calculations apply. After consuming some number of bits,
then the interval [zmin[i],zmax[i]) will become small. We can produce
some random bits by simply reading out those bits of zmin and zmax that agree.
For example, we might find

where the underlined bits agree

This means that z[i] is somewhere in this interval. It is thus
safe to produce 10111011011 as completely random bits even though
we don't know what x is!

The following program uses this algorithm to generate random bits.
It rescales each zmin,zmax (called rmin, rmax in the code)
to the interval [0,1) as it generates each bit.
The program uses a more limited model of pone that is dependent
on k previous bits.
The program is invoked

	unbias k p000 p001 ... p111

where there must be 2^k piii giving the probabilty of seeing the bit
sequence iii, and the least significant bit appears last in the input.
The input to the program is a sequence of ascii 0 and 1s.
Have fun.

David Lewis
*******cut here***


double atof ();

long random();

#define nprev 10

int more;
int bitsprecision = 30;

{	more = 0;

int getbit ()
{	char c;

	while (c = getchar (), c == ' ' || c == '\n')
	if (c == '0' || c == '1')
		return (c - '0');
	{	more = 0;
		return (0);

main (argc, argv)
int argc;
char *argv[];
{	float *pone, *pzero;
	int nbits, lognbits, mask;
	int prevbits;
	int intpone;
	int i;
	int b;
	int nbitsspat;
	int x;
	char randstate[64];
	float rmin, rmax;
	float t;

	lognbits = atoi (argv [1]);
	nbits = 1 << lognbits;
	mask = nbits - 1;
	prevbits = 0;
	pone = (float *) malloc (nbits * sizeof (float));
	pzero = (float *) malloc (nbits * sizeof (float));
	intpone = pone [0] * (1 << bitsprecision);
	for (i = 0; i < nbits; i++)
	{	pone [i] = atof (argv [i + 2]);
		pzero [i] = 1.0 - pone [i];
	more = 1;
	signal (SIGINT, stopplease);
	initstate (123456, randstate, 64);
	rmin = 0.0;
	rmax = 1.0;
	for (i = 0; i < lognbits; i++)
	{	prevbits = prevbits << 1 | getbit ();
	while (more)
	{	while (rmin < .5 && rmax > .5 && more)
		{	b = getbit ();
			if (b)
			{	rmin += pzero [prevbits] * (rmax - rmin);
			{	rmax -= pone [prevbits] * (rmax - rmin);
/*			printf (" %e %e\n", rmin, rmax);		*/
			prevbits = (prevbits << 1 | b) & mask;
		if (more)
		{	if (rmin >= .5)
			{	putchar ('1');
				rmax = 2.0 * rmax - 1.0;
				rmin = 2.0 * rmin - 1.0;
			{	putchar ('0');
				rmax *= 2.0;
				rmin *= 2.0;
			if (! (nbitsspat & 63))
				putchar ('\n');
	fflush (stdout);
****end of code****