Newsgroups: sci.crypt Path: illuminati.io.com!uunet!gatech!howland.reston.ans.net!usc!crash!straits From: straits@crash.cts.com (Stewart Strait) Subject: Re: Doing Better than XOR in RC4-like Algorithms Organization: CTS Network Services (CTSNET), San Diego, CA Date: Sat, 19 Nov 1994 06:48:15 GMT Message-ID:X-Newsreader: TIN [version 1.2 PL2] References: <3a7llc$kej@netaxs.com> <3acu3n$36j@netaxs.com> Sender: news@crash.cts.com (news subsystem) Nntp-Posting-Host: crash.cts.com Lines: 42 Steve O'Neill (soneill@unix3.netaxs.com) wrote: ... : To me, this means that the "plaintext" can be binary, : compressed text, Fijian text, or anything else. If an attacker doesn't know : in advance what the nature of the original messages is, having the XOR of : two messages in front of him doesn't give him _any_ information. If it did, : then every one-time pad ever devised would have been broken, since XORing : one unknown message with another unknown message is equivalent to a one-time : pad. ... A vital quantitative point is missed here. All the attacker needs is for the sum of the redundancies of the two messages to exceed 100% by a large margin. XORing one unknown message with another is _not_ equivalent to a one-time pad unless 'unknown' means 'so unknown that all possible messages are roughly equally likely'. English has about one bit per character of information. You would need compression of about 4:1 to get the redundancy down to 50% in each message. PKZIP only gets about 2.2:1, for example. Removing all non-alpha characters and using all caps compresses by about 2:1 but is lossy. If you want to compress much better that this, your compression algorithm needs to know cryptographic-style detailed statistics of the language used. This is equivalent to designing a code (in the old-fashioned codebook sense) that compresses enough to resist analysis. Similar arguements hold for other human and machine languages. If the attacker knows that the message is in one of a few hundred languages, some of which are human and some machine code, and knows that the message is either ascii, machine binary, or compressed with one of the few commonly used compression programs, he knows enough. Just because amateurs like me can't try a few thousand hobbyist-level attacks, lacking the language knowledge and the time, doesn't mean a whole lot in the real world. A real crypto organization could hardly exist without having statistics on all these possibilities. Not knowing which possibility is actually realized only costs the attacker about 10 to 15 bits of information out of a message length which is most likely thousands of bits.