Subject: Re: What the hell did Santha-Vazirani paper actually say?
Summary: algorithm doesn't work, this one does
From: lewis@eecg.toronto.edu (david lewis @ EECG, University of Toronto)
I have no significant crypto experience, but I believe that
the enclosed is correct.

I have read these articles with interest, since some of us
devloped a noise box that produced nearly random bits (something
like probability of a one is 51%). I developed an algorithm that
is optimal and provably generates random output for any input
of nearly random bits, with non-randomness that can be described.
By optimal, I mean that it produces a bit of output for every bit
of entropy in its input.

The algorithm posted is not only inefficient, it is wrong.
It does not produce random bits. Specifically, if p is the probabilty
of seeing a one from each of the n noise sources, then the probability
that there are an even number of ones (and thus a zero as the result
of xoring them all is (in maple format)

 pb:=proc(p,n) sum(p^(2*i)*((1-p)^(n-2*i))*n!/((2*i)!*(n-2*i)!),i=0..n/2) end;

which simplifies to
                                           n
                              1/2 (1 - 2 p)  + 1/2

For example, 3 noise sources with prob of a 1 of .4 have a .504 probability
of generating a 0. Almost random, but not quite.

Xoring the result of this with a square wave is irrelevant; it may
remove bias, but by making one bit have a .504 probability of one,
followed by a .504 probability of zero.

The algorithm is also inefficient because the p=.4 noise sources have entropy
of .971 bit per baud, so a total of 2.91 bits of entropy are consumed
to produce a random bit.


The solution I ultimately devised is completely general and efficient.
It takes a source of bits with some randomness which can be modelled
by pone(prevbits), giving the probability of seeing a one as the next bit
given any number of previous bits. The pone function can be arbitrarily
complicated; as long as it exactly models the non-randomness in the
input, the output will be completely random. In practice, a table listing
pone(prevbits) for some finite number of previous bits is sufficient.

An explanation of the algorithm follows.

Suppose we have a real number x, x member [0,1),
and the probability distribution of x is completely uniform. Then
it is trivial to generate any number of random bits by inspecting the binary
representation of x.

The next step is to realise that a sequence of nearly random bits can
be used to generate some extra bits. Given some nearly
random sequence of bits r[i], i = 1..n, with probabilty of a one pone[i],
define z[0] = x, and

	 z[i] = (1 - pone [i]) + z[i-1] * pone [i]	if r[i] is 1
	 z[i] = z[i-1] * (1 - pone [i])			if r[i] is 0

As long as x is random, and pone[i] exactly models the probabilty
of a one in r[i], then z[i] is randomly distributed in [0,1).

The problem, of course, is that we don't have a perfectly random number x
to start with. So lets represent the unknown x by the interval [0,1), and
perform the same calculation. Then each z[i] is an interval [zmin[i],zmax[i]),
and the same calculations apply. After consuming some number of bits,
then the interval [zmin[i],zmax[i]) will become small. We can produce
some random bits by simply reading out those bits of zmin and zmax that agree.
For example, we might find
	zmin[i]=.101110110110101011....
and
	zmin[i]=.101110110111011101....
	         ^^^^^^^^^^^

where the underlined bits agree

This means that z[i] is somewhere in this interval. It is thus
safe to produce 10111011011 as completely random bits even though
we don't know what x is!

The following program uses this algorithm to generate random bits.
It rescales each zmin,zmax (called rmin, rmax in the code)
to the interval [0,1) as it generates each bit.
The program uses a more limited model of pone that is dependent
on k previous bits.
The program is invoked

	unbias k p000 p001 ... p111

where there must be 2^k piii giving the probabilty of seeing the bit
sequence iii, and the least significant bit appears last in the input.
The input to the program is a sequence of ascii 0 and 1s.
Have fun.

David Lewis
*******cut here***

#include 
#include 
#include 

double atof ();

long random();

#define nprev 10


int more;
int bitsprecision = 30;

stopplease()
{	more = 0;
}

int getbit ()
{	char c;

	while (c = getchar (), c == ' ' || c == '\n')
		;
	if (c == '0' || c == '1')
		return (c - '0');
	else
	{	more = 0;
		return (0);
	}
}




main (argc, argv)
int argc;
char *argv[];
{	float *pone, *pzero;
	int nbits, lognbits, mask;
	int prevbits;
	int intpone;
	int i;
	int b;
	int nbitsspat;
	int x;
	char randstate[64];
	float rmin, rmax;
	float t;

	lognbits = atoi (argv [1]);
	nbits = 1 << lognbits;
	mask = nbits - 1;
	prevbits = 0;
	pone = (float *) malloc (nbits * sizeof (float));
	pzero = (float *) malloc (nbits * sizeof (float));
	intpone = pone [0] * (1 << bitsprecision);
	for (i = 0; i < nbits; i++)
	{	pone [i] = atof (argv [i + 2]);
		pzero [i] = 1.0 - pone [i];
	}
	more = 1;
	signal (SIGINT, stopplease);
	initstate (123456, randstate, 64);
	rmin = 0.0;
	rmax = 1.0;
	for (i = 0; i < lognbits; i++)
	{	prevbits = prevbits << 1 | getbit ();
	}
	while (more)
	{	while (rmin < .5 && rmax > .5 && more)
		{	b = getbit ();
			if (b)
			{	rmin += pzero [prevbits] * (rmax - rmin);
			}
			else
			{	rmax -= pone [prevbits] * (rmax - rmin);
			}
/*			printf (" %e %e\n", rmin, rmax);		*/
			prevbits = (prevbits << 1 | b) & mask;
		}
		if (more)
		{	if (rmin >= .5)
			{	putchar ('1');
				rmax = 2.0 * rmax - 1.0;
				rmin = 2.0 * rmin - 1.0;
			}
			else
			{	putchar ('0');
				rmax *= 2.0;
				rmin *= 2.0;
			}
			nbitsspat++;
			if (! (nbitsspat & 63))
				putchar ('\n');
		}
	}
	fflush (stdout);
}
****end of code****