Subject: Re: What the hell did Santha-Vazirani paper actually say? Summary: algorithm doesn't work, this one does From: lewis@eecg.toronto.edu (david lewis @ EECG, University of Toronto) I have no significant crypto experience, but I believe that the enclosed is correct. I have read these articles with interest, since some of us devloped a noise box that produced nearly random bits (something like probability of a one is 51%). I developed an algorithm that is optimal and provably generates random output for any input of nearly random bits, with non-randomness that can be described. By optimal, I mean that it produces a bit of output for every bit of entropy in its input. The algorithm posted is not only inefficient, it is wrong. It does not produce random bits. Specifically, if p is the probabilty of seeing a one from each of the n noise sources, then the probability that there are an even number of ones (and thus a zero as the result of xoring them all is (in maple format) pb:=proc(p,n) sum(p^(2*i)*((1-p)^(n-2*i))*n!/((2*i)!*(n-2*i)!),i=0..n/2) end; which simplifies to n 1/2 (1 - 2 p) + 1/2 For example, 3 noise sources with prob of a 1 of .4 have a .504 probability of generating a 0. Almost random, but not quite. Xoring the result of this with a square wave is irrelevant; it may remove bias, but by making one bit have a .504 probability of one, followed by a .504 probability of zero. The algorithm is also inefficient because the p=.4 noise sources have entropy of .971 bit per baud, so a total of 2.91 bits of entropy are consumed to produce a random bit. The solution I ultimately devised is completely general and efficient. It takes a source of bits with some randomness which can be modelled by pone(prevbits), giving the probability of seeing a one as the next bit given any number of previous bits. The pone function can be arbitrarily complicated; as long as it exactly models the non-randomness in the input, the output will be completely random. In practice, a table listing pone(prevbits) for some finite number of previous bits is sufficient. An explanation of the algorithm follows. Suppose we have a real number x, x member [0,1), and the probability distribution of x is completely uniform. Then it is trivial to generate any number of random bits by inspecting the binary representation of x. The next step is to realise that a sequence of nearly random bits can be used to generate some extra bits. Given some nearly random sequence of bits r[i], i = 1..n, with probabilty of a one pone[i], define z[0] = x, and z[i] = (1 - pone [i]) + z[i-1] * pone [i] if r[i] is 1 z[i] = z[i-1] * (1 - pone [i]) if r[i] is 0 As long as x is random, and pone[i] exactly models the probabilty of a one in r[i], then z[i] is randomly distributed in [0,1). The problem, of course, is that we don't have a perfectly random number x to start with. So lets represent the unknown x by the interval [0,1), and perform the same calculation. Then each z[i] is an interval [zmin[i],zmax[i]), and the same calculations apply. After consuming some number of bits, then the interval [zmin[i],zmax[i]) will become small. We can produce some random bits by simply reading out those bits of zmin and zmax that agree. For example, we might find zmin[i]=.101110110110101011.... and zmin[i]=.101110110111011101.... ^^^^^^^^^^^ where the underlined bits agree This means that z[i] is somewhere in this interval. It is thus safe to produce 10111011011 as completely random bits even though we don't know what x is! The following program uses this algorithm to generate random bits. It rescales each zmin,zmax (called rmin, rmax in the code) to the interval [0,1) as it generates each bit. The program uses a more limited model of pone that is dependent on k previous bits. The program is invoked unbias k p000 p001 ... p111 where there must be 2^k piii giving the probabilty of seeing the bit sequence iii, and the least significant bit appears last in the input. The input to the program is a sequence of ascii 0 and 1s. Have fun. David Lewis *******cut here*** #include#include #include double atof (); long random(); #define nprev 10 int more; int bitsprecision = 30; stopplease() { more = 0; } int getbit () { char c; while (c = getchar (), c == ' ' || c == '\n') ; if (c == '0' || c == '1') return (c - '0'); else { more = 0; return (0); } } main (argc, argv) int argc; char *argv[]; { float *pone, *pzero; int nbits, lognbits, mask; int prevbits; int intpone; int i; int b; int nbitsspat; int x; char randstate[64]; float rmin, rmax; float t; lognbits = atoi (argv [1]); nbits = 1 << lognbits; mask = nbits - 1; prevbits = 0; pone = (float *) malloc (nbits * sizeof (float)); pzero = (float *) malloc (nbits * sizeof (float)); intpone = pone [0] * (1 << bitsprecision); for (i = 0; i < nbits; i++) { pone [i] = atof (argv [i + 2]); pzero [i] = 1.0 - pone [i]; } more = 1; signal (SIGINT, stopplease); initstate (123456, randstate, 64); rmin = 0.0; rmax = 1.0; for (i = 0; i < lognbits; i++) { prevbits = prevbits << 1 | getbit (); } while (more) { while (rmin < .5 && rmax > .5 && more) { b = getbit (); if (b) { rmin += pzero [prevbits] * (rmax - rmin); } else { rmax -= pone [prevbits] * (rmax - rmin); } /* printf (" %e %e\n", rmin, rmax); */ prevbits = (prevbits << 1 | b) & mask; } if (more) { if (rmin >= .5) { putchar ('1'); rmax = 2.0 * rmax - 1.0; rmin = 2.0 * rmin - 1.0; } else { putchar ('0'); rmax *= 2.0; rmin *= 2.0; } nbitsspat++; if (! (nbitsspat & 63)) putchar ('\n'); } } fflush (stdout); } ****end of code****