Subject: Re: Redundancy eliminator
From: (Stephen R. Tate @ Duke University Computer Science Dept.; Durham, N.C.)
In article <695@hhb.UUCP> istvan@hhb.UUCP (Istvan Mohos) writes:
>In article <1991Feb1.081237.27790@santra.uucp>, 
> (Tapani Lindgren) writes:

>>There is a simple way to get an unbiased random sequence from a
>>biased one.  It has been described in several recent postings,
>>so I won't repeat it here.
>A number of us have apparently missed these articles.
>Re. article <> (Doug Gwyn):
>>...I would find the existence of such an algorithm most surprising.

From the fact that you posted these two statements together, it
seems that you are assuming the two previous posters are talking
about the same thing --- they're not.  The first posting was about
removing the bias from a truly random sequence --- this is fairly
trivial (see below).  If I remember correctly, Doug Gwyn was talking
about removing redundancy and information content from English
text.  These are very, very different things.

As for removing the bias from a biased (but random) bit stream,
consider the following method (I'm sure there are more sophisticated
ways, but this was what I could come up with off the top of my head).

If you take a sequence of biased bits and XOR the values, the distribution
of the result gets closer and closer to 50/50 (i.e., unbiased) depending
on how many bits are in the sequence.  This removes quite a bit of
the bias --- in fact, you can get as close to unbiased as you want
by simply using large sequences of bits for each resulting bit.  The
point here, I suppose, is that it should be cheaper to generate
the biased bits, so you can generate a lot.  

I would bet that in general, you cannot remove all the bias... maybe
for certain values of bias it's possible, but not in general.  Of course,
you could also do the following, but it is no longer guaranteed to
halt (for all practical purposes it will halt, though):  read
two consecutive bits from the biased stream.  If they're the same,
then start over.  If the bits are 0/1 (in that order), output
0;  otherwise, output 1.  If the probability of a 1 in the biased
bitstream is p, then the probability of outputing 0 is p(1-p), the
probability of outputing 1 is p(1-p), and the probability of
having to start over is p*p+(1-p)*(1-p), so this is a completely
unbiased bitstream.  Notice that in practical situations, the bias
is probably not very high, so this is a very good solution.  However,
the fact that it could continue forever without generating any bits
is a little unsettling to theoreticians... 

Steve Tate               
Dept. of Computer Science
Duke University
Durham, NC  27706