```Subject: Re: What the hell did Santha-Vazirani paper actually say?
Summary: algorithm doesn't work, this one does
From: lewis@eecg.toronto.edu (david lewis @ EECG, University of Toronto)
I have no significant crypto experience, but I believe that
the enclosed is correct.

I have read these articles with interest, since some of us
devloped a noise box that produced nearly random bits (something
like probability of a one is 51%). I developed an algorithm that
is optimal and provably generates random output for any input
of nearly random bits, with non-randomness that can be described.
By optimal, I mean that it produces a bit of output for every bit
of entropy in its input.

The algorithm posted is not only inefficient, it is wrong.
It does not produce random bits. Specifically, if p is the probabilty
of seeing a one from each of the n noise sources, then the probability
that there are an even number of ones (and thus a zero as the result
of xoring them all is (in maple format)

pb:=proc(p,n) sum(p^(2*i)*((1-p)^(n-2*i))*n!/((2*i)!*(n-2*i)!),i=0..n/2) end;

which simplifies to
n
1/2 (1 - 2 p)  + 1/2

For example, 3 noise sources with prob of a 1 of .4 have a .504 probability
of generating a 0. Almost random, but not quite.

Xoring the result of this with a square wave is irrelevant; it may
remove bias, but by making one bit have a .504 probability of one,
followed by a .504 probability of zero.

The algorithm is also inefficient because the p=.4 noise sources have entropy
of .971 bit per baud, so a total of 2.91 bits of entropy are consumed
to produce a random bit.

The solution I ultimately devised is completely general and efficient.
It takes a source of bits with some randomness which can be modelled
by pone(prevbits), giving the probability of seeing a one as the next bit
given any number of previous bits. The pone function can be arbitrarily
complicated; as long as it exactly models the non-randomness in the
input, the output will be completely random. In practice, a table listing
pone(prevbits) for some finite number of previous bits is sufficient.

An explanation of the algorithm follows.

Suppose we have a real number x, x member [0,1),
and the probability distribution of x is completely uniform. Then
it is trivial to generate any number of random bits by inspecting the binary
representation of x.

The next step is to realise that a sequence of nearly random bits can
be used to generate some extra bits. Given some nearly
random sequence of bits r[i], i = 1..n, with probabilty of a one pone[i],
define z[0] = x, and

z[i] = (1 - pone [i]) + z[i-1] * pone [i]	if r[i] is 1
z[i] = z[i-1] * (1 - pone [i])			if r[i] is 0

As long as x is random, and pone[i] exactly models the probabilty
of a one in r[i], then z[i] is randomly distributed in [0,1).

The problem, of course, is that we don't have a perfectly random number x
to start with. So lets represent the unknown x by the interval [0,1), and
perform the same calculation. Then each z[i] is an interval [zmin[i],zmax[i]),
and the same calculations apply. After consuming some number of bits,
then the interval [zmin[i],zmax[i]) will become small. We can produce
some random bits by simply reading out those bits of zmin and zmax that agree.
For example, we might find
zmin[i]=.101110110110101011....
and
zmin[i]=.101110110111011101....
^^^^^^^^^^^

where the underlined bits agree

This means that z[i] is somewhere in this interval. It is thus
safe to produce 10111011011 as completely random bits even though
we don't know what x is!

The following program uses this algorithm to generate random bits.
It rescales each zmin,zmax (called rmin, rmax in the code)
to the interval [0,1) as it generates each bit.
The program uses a more limited model of pone that is dependent
on k previous bits.
The program is invoked

unbias k p000 p001 ... p111

where there must be 2^k piii giving the probabilty of seeing the bit
sequence iii, and the least significant bit appears last in the input.
The input to the program is a sequence of ascii 0 and 1s.
Have fun.

David Lewis
*******cut here***

#include
#include
#include

double atof ();

long random();

#define nprev 10

int more;
int bitsprecision = 30;

{	more = 0;
}

int getbit ()
{	char c;

while (c = getchar (), c == ' ' || c == '\n')
;
if (c == '0' || c == '1')
return (c - '0');
else
{	more = 0;
return (0);
}
}

main (argc, argv)
int argc;
char *argv[];
{	float *pone, *pzero;
int prevbits;
int intpone;
int i;
int b;
int nbitsspat;
int x;
char randstate[64];
float rmin, rmax;
float t;

lognbits = atoi (argv [1]);
nbits = 1 << lognbits;
prevbits = 0;
pone = (float *) malloc (nbits * sizeof (float));
pzero = (float *) malloc (nbits * sizeof (float));
intpone = pone [0] * (1 << bitsprecision);
for (i = 0; i < nbits; i++)
{	pone [i] = atof (argv [i + 2]);
pzero [i] = 1.0 - pone [i];
}
more = 1;
initstate (123456, randstate, 64);
rmin = 0.0;
rmax = 1.0;
for (i = 0; i < lognbits; i++)
{	prevbits = prevbits << 1 | getbit ();
}
while (more)
{	while (rmin < .5 && rmax > .5 && more)
{	b = getbit ();
if (b)
{	rmin += pzero [prevbits] * (rmax - rmin);
}
else
{	rmax -= pone [prevbits] * (rmax - rmin);
}
/*			printf (" %e %e\n", rmin, rmax);		*/
prevbits = (prevbits << 1 | b) & mask;
}
if (more)
{	if (rmin >= .5)
{	putchar ('1');
rmax = 2.0 * rmax - 1.0;
rmin = 2.0 * rmin - 1.0;
}
else
{	putchar ('0');
rmax *= 2.0;
rmin *= 2.0;
}
nbitsspat++;
if (! (nbitsspat & 63))
putchar ('\n');
}
}
fflush (stdout);
}
****end of code****

```