From: (Stefan Lucks )

Subject: XOR of plaintexts   was: Doing Better than XOR in RC4-like Algorithms
Organization: GWDG, Goettingen
References:  <3a7llc$>
Date: Tue, 15 Nov 1994 08:13:21 GMT
Lines: 53 (Steve O'Neill) writes:

>In article  
> (Farid F. El-Wailly) writes:
>>If the same key is used to encrypt two files then cryptanalysis is
>>trivial due to the nature of the exclusive-or. You can cancel out
>>the random sequence by XORing the two cyphertexts together.

>Unless I'm mistaken, this makes the cryptanalysis easier, but not trivial.

I agree: The cryptanalysis of two xored plaintexts is not trivial.

>When you XOR the two ciphertexts together, what you wind up with is the XOR
>of _both_ plaintexts, which, unless you know one of them, doesn't help you
>very much. Known or chosen plaintext attacks can use this effect, but, in
>the general case, it won't do anything to simplify the analysis.

It seems to be difficult to decrypt both plaintexts completely. But if you
know the topics, the plaintexts deal with--or at least the language they
are written in--you are able to discover large fractions of the plaintexts.

You start with some text-fractions, which you expect to find in at least
one of the plaintexts. These are called probable words. In the case of any 
English texts this might include " the ", " this ", " does ", ... (the 
spaces help to increase the size of the probable words). At each character 
position decrypt your text with the given probable words. You can expect to 
find a small number of pairs (P,W)=(character position, probable word), 
where the text at position P decoded with W looks like a fraction F of an 
English text. For F you try to guess some following characters F1 and some 
preceeding characters F0. I.e. if F is "do an", one choice for F0 might be 
"to "; reasonable choices for F1 might be " a", " e", " i", " o", " u", "y ", 
or "ything". Use F0 and F1 to decode your ciphertext at the appropriate 
positions. If what you find does look like English text, you probably have 
guessed right. Then guess again and repeat until you can't make any more
reasonable guesses.

This attack works the better, the longer the probable words are. Therefore
it is good to know the topics of at least one of the plaintexts. I.e. if 
the plaintext(s) deal with cryptography, you can try long probable words
like "cipher", " ciphertext", " plaintext", " cryptanalysis", ...

Natural languages like English (French, German, ...) are so redundant, that
one can find out large fractions of the plaintexts when given only the xor 
of two plaintexts. If the plaintexts are (or one of them is) compressed, 
things become more difficult.


Stefan Lucks  Institut f"ur Numerische und Angewandte Mathematik,
              Lotzestra\3e 16-18,   37083 G"ottingen,   Germany
e-mail: (oder
Wer einem Computer Unsinn erz"ahlt, mu\3 damit rechnen.