Hashing and CRC

Terry Ritter

A Ciphers By Ritter Page

Hashing has various cryptographic uses, including the generation of fixed-length keys from arbitrary key phrases, and the accumulation or "distillation" of randomness or "entropy." Here we have a discussion on CRC polynomials, comments on hash127 (said to be faster than CRC), and a discussion resulting from "reverse CRC."
1998-10- 9 Kai Harrekilde-Petersen: "ECC's (and CRC's for that matter) aren't "discovered"; they are _constructed_ or 'designed'."
1998-10- 9 Kai Harrekilde-Petersen: "I cannot tell from looking at the polynomials above whether they are primitive or not - but if they are the official CRC's they should be."
1998-10- 9 Kai Harrekilde-Petersen: "We are talking about polynomials, working in finite fields."
1998-10- 9 Dik T. Winter: "Note that the CCITT when written as a binary number is *not* prime (it is divisable by 5), but as a polynomial over GF(2) it is primitive, i.e. *not* divisible by another polynomial over GF(2)."
1998-10- 9 Terry Ritter: "Alas, I happen to know that 11 is a factor, and we don't have to argue about it, we can show it . . . ."
1998-10-12 Dik T. Winter: "Indeed yes, as it has an even number of non-zero coefficients it is obviously divisible by X+1. Argh."
1998-10-11 Colin Plumb: "Irreducibility isn't enough, you need primitivity as well."
1998-10-11 Terry Ritter: "No, we don't *need* primitivity (or even irreducibility), obviously, since the early CRC polys were composite. Nor, I think, does it even help very much, in the normal low-error-rate situation."
1999-04-06 D. J. Bernstein: "hash127 computes a 16-byte secret-key signature of an arbitrarily long message." "The first release of hash127 is faster than the best asm versions of MD5 on a variety of architectures."
1999-04-08 Paul Rubin: "Care to explain anything about the algorithm?!"
1999-04-09 D. J. Bernstein: "What the code does is compute a polynomial . . . at very high speed . . . ."
1999-04-10 Scott Fluhrer: ". . . anyone who has access to r and x can easily create collisions . . . ."
1999-04-10 D. J. Bernstein: "Secret-key signatures (also known as ``message authentication codes'') are much faster than public-key signatures."
1999-04-14 pierrepuzer@hotmail.com: "How, exactly, do you prove the safeness of hash127?"
1999-04-14 D. J. Bernstein: "Pick any two distinct messages, m and m', and (not necessarily distinct) integers s and s' between 0 and p-1, where p = 2^127-1."
1999-04-15 puzer@my-dejanews.com: "Why not make an ssh replacement using hash127, Snuffle/SURF or SEOC, ucspi-tcp, ptyget and some key-exhange software?"
1999-04-16 gteabo@my-dejanews.com: "I have an idea to reduce redundancy in a large data store."
1999-04-15 John Myre: "Any good hash (e.g. SHA-1) ought to work."
1999-04-16 gteabo@my-dejanews.com: "I wonder whether sufficient randomization can be achieved in just 20 bytes considering the non-random characteristic of my inputted plaintext."
1999-04-16 Geoff Thorpe: "Loads of implementations are available in loads of places."
1999-04-17 Geoffrey Teabo: "If SHA1 and MD5 are well known secure hashes, what's a well known standard hash?"
1999-04-18 Geoff Thorpe: ". . . some others have mentioned a couple in these threads - the Universal Hash Function, and somebody also mentioned a simple (& fast) cute example involving XOR-ing in the blocks of the message one at a time . . . ."
1999-04-16 D. J. Bernstein: "Will the results be kept secret? If so, you can use a ``universal hash function.''"
1999-04-16 gteabo@my-dejanews.com: "My exact application involves archiving."
1999-04-16 Terje Mathisen: "Any of the well-known message digest functions would work, or you could use DES or some other encryption algorithm, xor'ing 64-bit result blocks together."
1999-04-17 D. J. Bernstein: "hash127 on my old Pentium-133 runs at about 238 million bits per second from L2 cache; e.g., it hashes 40000 bytes in about 1.4 milliseconds."
1999-04-17 Geoffrey Teabo: "In light of my specific issues, another post to this thread suggested that a 128 bit CRC would be okay, and that a hash is more trouble than it's worth."
1999-04-17 Vernon Schryver: "On the contrary, CRC-32 and other error detecting and correcting hash functions are expressly designed to detect such small changes."
1999-04-17 D. J. Bernstein: "Actually, a 128-bit CRC _with a random polynomial_ guarantees an extremely low collision probability for any fixed set of inputs." "(Just like hash127, except that hash127 is faster and provides a slightly better bound on the collision probability.)"
1999-04-17 David Wagner: "I'm confused. Isn't it true that CRC(x) = CRC(0||x), for any polynomial? ("||" represents concatenation of bits.)"
1999-04-18 Terje Mathisen: "Only if your CRC function is initialized with zero!"
1999-04-19 D. J. Bernstein: "Not if you initialize the CRC correctly."
1999-04-17 Scott Fluhrer: ". . . you might as well use a 128 bit CRC. This can be computed much faster than any known hash function, and provably will change on small differences, and on random changes, it's maximally good . . . ."
1999-04-17 Geoffrey Teabo: "I really don't care about "attackers" at all."
1999-04-17 Boris Kazak: "This proposed function is based on multiplication (mod 2^32+1) with two "twists"."
1999-04-17 D. J. Bernstein: ". . . in software, arithmetic in large characteristic benefits from CPU support for fast multiplication . . . ."
1999-04-17 Terry Ritter: "For production use, CRC's are table-driven."
1999-04-18 D. J. Bernstein: "Now, if you could magically set up a size-2^32 table instead of a size-2^8 table, you could get away with far fewer operations . . . ." "That's basically the speed achieved by hash127 . . . ."
1999-04-19 Terry Ritter: "I still find this hard to believe. What happened to the multi-precision divide?"
1999-04-19 D. J. Bernstein: "That's why they pay me the big bucks. :-)"
1999-04-19 Terry Ritter: "That's fine, but I'd rather have an explanation in plain English . . . ."
1999-04-19 D. J. Bernstein: "These _are_ multiprecision operations."
1999-04-28 Mok-Kong Shen: "Could you say what speed you can achieve with that on a typical processor . . . ?"
1999-04-28 Jerry Coffin: "I just did a quick check with a version I have lying around and it did around 5 megabytes a second a Pentium II/400."
1999-05-01 John A. Limpert: "I ran benchmarks of various CRC schemes on a Pentium a few years ago. Most ran in the range of 5-10 megabytes/second."
1999-05-01 D. J. Bernstein: "hash127 handles input straight from DRAM at over 200 million bits/second on my old Pentium-133."
1999-06-03 Amit Ghosh: ". . . crc32 is described in ISO 3309."
1999-06-03 Rosco Schock: ". . . if you change to a filesize / CRC pair of numbers then the only way you will get the same checksum is if the files are the same."
1999-06-03 Anwer Bashi: "Not true. You can still get the same CRC with two different files of the same size."
1999-06-03 Bryan Olson: "Just XOR in the CRC polynomial."
1999-06-03 Paul Koning: "For long errors, n-bit CRC will fail to detect one in 2^n errors. For short errors, it does better (in particular, a good CRC will detect *all* errors of length k or less where k is n plus/minus a few)."
1999-06-07 tomstdenis@my-deja.com: "MD5 is probably a better choice as it has 16 fewer round . . . ."
1999-06-09 Jerry Leichter: "A *secret* CRC polynomial provides a primitive related to (but by no means the same as) a secure hash."
1999-06-03 Bryan Olson: "If you're only worried about accidental collisions, a large CRC is fine."
1999-06-04 iLLusIOn: "i was asking for a more secure algorithm (with the same or better speed) then CRC32 as less duplicate checksums would save me time doing byte for byte comparisons."
1999-06-04 Paul Koning: "If you don't have intelligent attackers, any cryptographic hash is serious overkill and bound to take much more CPU time than error checking codes."
1999-06-05 D. J. Bernstein: "http://pobox.com/~djb/hash127.html is almost as fast as an optimized CRC-32, but produces a 127-bit hash, using a 128-bit key."
2000-02-25 Liu, Qingzhu [CAR:CF36:EXCH]: "I try to implement an algorithm for crc-16 reverse . . . ."
2000-02-25 John Savard:
2000-02-25 Doug Stell: "Attached is the table and code fragment for a byte-wide CRC-16."
2000-02-25 Anne & Lynn Wheeler: "one of the interesting things in CRC16 chips in the '70s where computer boxes using it for bit error detection and RF modem boxes using the same chip for eliminating runs of zeros."
2000-02-25 Doug Stell: "CRCs that are initialized to all zeros (CRC-16 is one of these) can not detect the addition or deletion of leading zeros. Likewise, CRCs that are initialized to all ones can not not detect the addition or deletion of leading ones." (ed. note: that is FALSE; see below)
2000-02-26 Marty: "Inserted and/or deleted ones at the start of a ones initialized CRC are indeed detected. It's also a good idea to negate the CRC at the end for detecting appended zeros."
2000-02-26 David A. Wagner: "I thought Doug Stell got it right."
2000-02-26 Marty: "No, I believe Doug's incorrect."
2000-02-27 David A. Wagner: "Perhaps there is some difference between CRCs and LFSRs that I'm missing?"
2000-02-27 Terry Ritter: "In a shift-left CRC, 0's always shift in, and this is independent of the data value. the only way the rightmost bit gets set to a 1 is from the poly add."
2000-02-27 David A. Wagner: "As far as I can see, it remains true that the all-ones state is bad for Fibonacci . . . ."
2000-02-28 Terry Ritter: "We "diverged" when some of us were talking about CRC, and others of us were not."
2000-02-27 Marty: "Terry, You gave a better explanation than I had. David is persistent, maybe at the end of it all he will be more enlightened."
2000-02-28 Terry Ritter: "I am not overly hopeful."
2000-02-27 Terry Ritter: "What a mess!" "Doug is wrong. CRC *does* indeed detect every 1-bit at the start of a message. Missing 1's *ARE* detected. Extra 1's *ARE* detected. Missing and extra 0's are also detected."
2000-02-27 David A. Wagner: "How can this be?"
2000-02-27 lordcow77: ". . . IF the register is initially initialized with all ones, the CRC algorithm will detect a prepended string ones OR zeroes."
2000-02-28 Terry Ritter: "If the CRC register is initialized with zeros, it cannot detect leading zeros in the data. That problem was detected early in CRC use, and from that time it became common to init the CRC register as ones. When this is done, CRC can detect both leading zeros and leading ones."
2000-02-27 David A. Wagner: "I will show a initial state which allows you to prepend as many one bits as you like without changing the final CRC hash."
2000-02-28 Terry Ritter: "Are you now saying that if, mid operations, the CRC register has some particular state, it will not detect extra 1's? While that may be an interesting idea, it is a completely different issue."
2000-02-29 David A. Wagner: ". . . I *suspect* that, no matter which configuration of CRC you use, there will usually be some initial state which doesn't detect leading ones."
2000-02-29 Terry Ritter: "We don't pick just any random initial state for CRC's. In data communications at least, both ends have to pick the same starting state."
2000-02-29 David A. Wagner: "Ok. It sounds like you are saying that noone ever uses any of the CRC configurations that could have a problem with the all-ones initialization state. Good."
2000-03-01 Douglas A. Gwyn: "The simple fact is, there are some garbles that *any* CRC scheme will fail to detect. One wants to design the scheme so that they are not among the *probable* modes of garbling."
2000-03-02 Paul Koning: "As far as I know Doug is right if he meant "for any given CRC polynomial, there are error patterns it will fail to detect"."
2000-03-03 Scott Nelson: "If you feed all possible N bit values into an N bit crc, (regardless of how it is initialized) you will get all possible N bit values."
2000-02-28 Doug Stell: "I am indeed wrong. A CRC initialized to all ones can detect inserted and/or deleted leading ones."
2000-02-27 Scott Nelson: "The "reverse" CRC is more common, since it's so much easier to implement in software."
2000-02-27 klockstone@cix.compulink.co.uk: "*THE* reference site for programmers looking to understand and write CRC algorithms is: http://www.ross.net/crc/crcpaper.html
2000-03-13 Terry Ritter: ""Reverse" means that the poly is intended to operate on bit-reversed data. This was needed for large 60's-style computer reel-to-reel tapes where the data could be read from either direction."
2000-08-02 Mark Wooding: "I remember testing CRC32 and MD4 on large files, and MD4 won rather convincingly."
2000-08-02 Terry Ritter: "The fast implementation uses a pre-computed table of 256 32-bit elements."
2000-08-02 Mark Wooding: "I'm not a *complete* idiot, Terry. I *did* implement the CRC like that."
2000-08-02 stanislav shalunov: "But if the input is short (say 32 bytes on average), CRC would still be faster, correct?"
2000-08-02 Mark Wooding: "Yes."
2000-10-05 Mack: "CRC are faster than HASH functions of comparable size. That is a fact."
2000-10-05 Scott Fluhrer: ". . . there are MACs which can be even faster than CRC."
2000-10-12 Scott Fluhrer: ". . . Note that one property that a MAC does not require is any collision resistance (or preimage resistance) whatsoever against someone with the keys"
2000-10-06 Bryan Olson: "It's not obvious to me how to get a 256 bit CRC from four of each of the 32-bit operations/byte."
2000-10-07 Mack: "There is a method for a 256 bit poly that uses 4 lookup tables and no shifts at all."
2000-10-07 Mack: ". . . if there is no security need then a CRC is 'better'."
2000-10-09 Bryan Olson: "I think any time I need a digest larger than 64-bits, I would always use a cryptographic hash."
2000-10-13 Mack: "Sending 2^32*128bits of data is no longer just hypothetical."
2000-10-13 Bryan Olson: "CRCs are certainly the winner in hardware speed. I don't think the other poster was right that cryptographic hashes can be as fast in hardware, at least not for the popular ones such as SHA-1."
2000-10-15 Mack: "The W buffer in SHA-1 acts very much like a CRC."
2000-10-16 Bryan Olson: "Whether you choose to look at it as being like a CRC is not the issue. It does not introduce collisions and does not produce the same problem."
2000-10-17 Mack: "In a situation where security is needed a CRC will not work. Otherwise a CRC is superior."
2000-10-21 Tim Tyler: ". . . speed issues with hashes could be reduced to be a relatively minor concern in the context of entropy distillation applications . . . ."
2000-10-22 Bryan Olson: "No SHA-1 hardware that anyone makes (or is likely to make) can match a hardware CRC."
2000-10-23 Tim Tyler: ". . . basically my statements boil down to the fact that some hash applications can be parallelised, so any deficiencies in raw speed can be traded for area."
2000-10-11 Tim Tyler: "While hash functions take longer to evaluate in software, they often have a layered structure that allows hardware implementations to accept new inputs before the last inputs have been completely processed . . . ."
2000-10-13 Bryan Olson: "In cryptographic hashes such as SHA-1, there exist opportunities for parallelism but the number of stages that must be sequential is a hardware nightmare compared to a CRC."
2000-10-19 Tim Tyler: ". . . it could make a big difference for instances where the processing of the hash is used to generate confusion sequences - e.g. in a RNG based on a counter-driven hash."


Subject: Re: So THAT's How It Works !!!
Date: 09 Oct 1998 07:46:13 +0200
From: Kai Harrekilde-Petersen <khp@dolphinics.no>
Message-ID: <kk3ogrm1e56.fsf@sciway.dolphinics.no>
References: <handleym-0810982209530001@macip2-149.apple.com>
   <eeeF0JHwF.D8o@netcom.com>
Newsgroups: comp.arch
Lines: 26

handleym@ricochet.net (Maynard Handley) writes:

> Maybe I'm wrong but, at least for CRCs, the impression I have is that all
> the magic numbers and polynomials involved were simply "discovered",
> they're not part of some massively deep theory. One simply uses them
> because they're there, not because of any other remarkable features they
> may have.

Maynard, you're wrong on this one.  CRC are heavily coupled to Error
Correcting Codes (ECC), and both have a large body of math behind them.
The polynomials and the numbers may seem 'magical' to you, but they
aren't.

ECC's (and CRC's for that matter) aren't "discovered"; they are
_constructed_ or 'designed'.

For example, a Reed-Solomon code used in Satellite communications is a
RS(233,32,16) code: It takes 233 bytes of data, adds 32 bytes of
parity, and can correct 16 independent 1-byte errors.  Generally, you
can design an RS code with 255-N bytes of data, adds N bytes of
parity, and can correct int(N/2) independent errors.

The math department of your local University or College, will probably
have a course in Error Correcting Codes.

--Kai


Subject: Re: So THAT's How It Works !!!
Date: 09 Oct 1998 07:50:05 +0200
From: Kai Harrekilde-Petersen <khp@dolphinics.no>
Message-ID: <kk3lnmq1dyq.fsf@sciway.dolphinics.no>
References: <slrn71r7fj.8tq.smithz@kriek.cae.wisc.edu>
   <eeeF0JHwF.D8o@netcom.com>
Newsgroups: comp.arch
Lines: 21

smithz@kriek.cae.wisc.edu (Zak Smith) writes:

> On Fri, 9 Oct 1998 03:05:03 GMT, Mark Thorson <eee@netcom.com> wrote:
> >You divide by a prime number slightly larger than the
> >data.  E.g. if you're sending a 16-bit word, you use
> >the following magic numbers:  11000000000000101 (CRC-16)
> >or 10001000000100001 (CCITT v.24).
> 
> Err, these aren't prime numbers:
> 
> 11000000000000101 = 1 + 4 + 2^15 + 2^16 = 98309 = 37 * 2657
> 10001000000100001 = 1 + 2^5 + 2^12 + 2^16 = 69665 = 5 * 13933
> 
> Is that what you meant?

We're talking polynomials in base 2, and that's a bit different from
plain math.  I cannot tell from looking at the polynomials above
whether they are primitive or not - but if they are the official CRC's
they should be.

--Kai


Subject: Re: So THAT's How It Works !!!
Date: 09 Oct 1998 12:15:50 +0200
From: Kai Harrekilde-Petersen <khp@dolphinics.no>
Message-ID: <kk37lya11nt.fsf@sciway.dolphinics.no>
References: <361de1a6.1117943@news.megsinet.net>
   <kk3lnmq1dyq.fsf@sciway.dolphinics.no>
Newsgroups: comp.arch
Lines: 19

msimon@tefbbs.com writes:

> Are number only prime in a given base?
> 
> I don't think so.

You seem to miss the point.  We are talking about polynomials, working
in finite fields.  In the finite field (2^0) 1+1=0.  In ordinary math
1+1=2.

The basic definitions of math in finitie fields and in N (the set of
integer numbers) are the same, but when you do the matj, the results
are quite different (eg: see above).

As far as I learned at university, this means that you cannot use
"ordinary" math to determine if a certain polynomial is primitive in
certain Finite Field.

--Kai


Subject: Re: So THAT's How It Works !!!
Date: Fri, 9 Oct 1998 12:30:10 GMT
From: dik@cwi.nl (Dik T. Winter)
Message-ID: <F0K82A.2KM@cwi.nl>
References: <361de590.2113019@news.megsinet.net>
   <kk37lya11nt.fsf@sciway.dolphinics.no>
Newsgroups: comp.arch
Lines: 28

In article <361de590.2113019@news.megsinet.net> msimon@tefbbs.com writes:
 > The CRC compares remainders. 
 > 
 > The reason for dividing by a prime is to distribute the remainders.

But the CRC does not divide by a prime.  It calculates the remainder
of a polynomial by a primitive polynomial over the field mod 2.  And
that is something completely different.

Or more precise: consider a message consisting of 0's and 1's.  Now
write the message as a polynomial in a variable X, e.g. the message
  1011011000101
becomes
  1.X^12 + 0.X^11 + 1.X^10 + 1.X^9 + 0.X^8 ...
(where ^ is exponentiation), this can be shortened to
  X^12 + X^10 + X^9 + X^7 + X^6 + X^2 + 1.
The complete message is divided by the CRC generating polynomial, for
CCITT X^16 + X^12 + X^5 + 1.  The remainder is the CRC (which is
a polynomial but trivially can be written in 0's and 1's).
Note that the CCITT when written as a binary number is *not* prime (it
is divisable by 5), but as a polynomial over GF(2) it is primitive, i.e.
*not* divisible by another polynomial over GF(2).

The reason for all this is that it is much simpler to implement in
hardware, only a shift register with feedback is needed.
-- 
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn  amsterdam, nederland; http://www.cwi.nl/~dik/


Subject: Re: So THAT's How It Works !!!
Date: Fri, 09 Oct 1998 18:48:52 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <361e5a7b.8169985@news.io.com>
References: <F0K82A.2KM@cwi.nl>
Newsgroups: comp.arch
Lines: 74


On Fri, 9 Oct 1998 12:30:10 GMT, in <F0K82A.2KM@cwi.nl>, in comp.arch
dik@cwi.nl (Dik T. Winter) wrote:

>
>The complete message is divided by the CRC generating polynomial, for
>CCITT X^16 + X^12 + X^5 + 1.  The remainder is the CRC (which is
>a polynomial but trivially can be written in 0's and 1's).
>Note that the CCITT when written as a binary number is *not* prime (it
>is divisable by 5), but as a polynomial over GF(2) it is primitive, i.e.
>*not* divisible by another polynomial over GF(2).

Alas, I happen to know that 11 is a factor, and we don't have to argue
about it, we can show it:  (This is mod-2 math.  For a quick
introduction, see my Crypto Glossary.)  


|         1111000000011111
|       ------------------
|   11 ) 10001000000100001   ( bits set = 16+12+5+0 )
|        11
|        --
|         10
|         11
|         --
|          10
|          11
|          --
|           11
|           11
|           --
|            000000010
|                   11
|                   --
|                    10
|                    11
|                    --
|                     10
|                     11
|                     --
|                      10
|                      11
|                      --
|                       11
|                       11
|                       --
|                        0 remainder (!!!)


Check:

|          1111000000011111
|        x               11
|          ----------------
|          1111000000011111
|         1111000000011111
|         -----------------
|         10001000000100001 = 16+12+5+1 bits set


So, no, CRC-CCITT is *not* irreducible.  (Which also means it is
*certainly* not primitive!)

These first-generation standard CRC's (along with CRC-16) were
constructed by taking an irreducible times the poly 11.  The poly 11
effectively provides the equivalent of the parity used in the earlier
systems.  As I recall, the argument was that CRC would thus never be
worse than parity.

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary 1998-08-27: http://www.io.com/~ritter/GLOSSARY.HTM



Subject: Re: So THAT's How It Works !!!
Date: Mon, 12 Oct 1998 09:17:00 GMT
From: dik@cwi.nl (Dik T. Winter)
Message-ID: <F0pJ4D.A5J@cwi.nl>
References: <361e5a7b.8169985@news.io.com>
Newsgroups: comp.arch
Lines: 14

In article <361e5a7b.8169985@news.io.com> ritter@io.com (Terry Ritter) writes:
 > On Fri, 9 Oct 1998 12:30:10 GMT, in <F0K82A.2KM@cwi.nl>, in comp.arch
 > dik@cwi.nl (Dik T. Winter) wrote:
 > >is divisable by 5), but as a polynomial over GF(2) it is primitive, i.e.
 > >*not* divisible by another polynomial over GF(2).
 > 
 > Alas, I happen to know that 11 is a factor, and we don't have to argue
 > about it, we can show it:

Indeed yes, as it has an even number of non-zero coefficients it is
obviously divisible by X+1.  Argh.
-- 
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn  amsterdam, nederland; http://www.cwi.nl/~dik/

              Nyx is not responsible for the actions of its users.
              Our AUP / Free Speech Policy are at http://www.nyx.net/policies/
              Direct complaints to abuse@nyx.net

Subject: Re: So THAT's How It Works !!!
Date: Sun, 11 Oct 1998 15:43:46 GMT
From: colin@nyx10.nyx.net (Colin Plumb)
Message-ID: <908120626.503631@iris.nyx.net>
References: <eeeF0JHwF.D8o@netcom.com>
Newsgroups: comp.arch
Lines: 35

In article <eeeF0JHwF.D8o@netcom.com>, Mark Thorson <eee@netcom.com> wrote:
> For 20 years I've been hearing about this important
> error checking algorithm called CRC (cyclic redundancy
> checking), and I never really needed to know what it
> was until today.
>
> It turns out to be really simple.  You do an integer
> divide of the word you are sending, and tack on the
> remainder to the message.  The receiver does the same
> integer divide and if the remainders match, there's
> probably no error.
>
> You divide by a prime number slightly larger than the
> data.  E.g. if you're sending a 16-bit word, you use
> the following magic numbers:  11000000000000101 (CRC-16)
> or 10001000000100001 (CCITT v.24).  If the remainders
> match, that catches all single and double bit errors,
> plus lots besides.
>
> Whoa!  How simple something can be when you actually
> look and see how it works!

This is *basically* the idea, except for the fact that:

- These aren't integers.  Indeed, these aren't "numbers" as
  non-mathematicians understand the concept.  (Nonethless, the
  analogies are strong, as in both cases, you're working in a ring.)
- They aren't primes.  The polynomial equivalent is "irreducibility",
  meaning having no non-trivial divisors, aand both
  polynomials are reducible.
- Irreducibility isn't enough, you need primitivity as well.

But still, it *is* basically the same concept.
-- 
	-Colin


Subject: Re: So THAT's How It Works !!!
Date: Sun, 11 Oct 1998 18:30:13 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <3620f88e.5661791@news.io.com>
References: <908120626.503631@iris.nyx.net>
Newsgroups: comp.arch
Lines: 30


On Sun, 11 Oct 1998 15:43:46 GMT, in <908120626.503631@iris.nyx.net>,
in comp.arch colin@nyx10.nyx.net (Colin Plumb) wrote:

>[...]
>- Irreducibility isn't enough, you need primitivity as well.

No, we don't *need* primitivity (or even irreducibility), obviously,
since the early CRC polys were composite.  Nor, I think, does it even
help very much, in the normal low-error-rate situation.  

There have even been proposals for software validation by arbitrary
random poly, thus preventing virus writers from knowing the CRC they
would encounter.  Each user would just take a poly at random. 

I do suspect that primitivity could provide some additional CRC
"guarantees" in particular cases.  But in the long run there are
always vast numbers of error patterns that are undetectable with any
particular CRC poly, primitive or irreducible.  This is acceptable
because most error patterns *are* detectable.  And most of that
performance is delivered simply by having some irreducible factor of
reasonable size.

But, on the other hand, if we can get a primitive, why not?

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary 1998-08-27: http://www.io.com/~ritter/GLOSSARY.HTM



Subject: Guaranteed message authentication faster than MD5
Date: 6 Apr 1999 23:34:22 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr623.34.22.28665@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 18

The first release of my hash127 package is available from

   http://pobox.com/~djb/hash127.html

hash127 computes a 16-byte secret-key signature of an arbitrarily long
message. An attacker cannot guess a signature of a different message
with probability better than b/2^{131}, where b is the total number of
bits in the messages and signatures.

hash127 is the first ``universal hash function'' at this security level
to break the MD5 speed barrier. The first release of hash127 is faster
than the best asm versions of MD5 on a variety of architectures. hash127
can be used as a fast replacement for HMAC-MD5 or HMAC-SHA.

I've set up a mailing list for discussions related to hash127. To join,
send an empty message to hash127-subscribe@list.cr.yp.to.

---Dan


Subject: Re: Guaranteed message authentication faster than MD5
Date: Thu, 8 Apr 1999 06:24:14 GMT
From: phr@netcom.com (Paul Rubin)
Message-ID: <phrF9uxsE.KuA@netcom.com>
References: <1999Apr623.34.22.28665@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 8

In article <1999Apr623.34.22.28665@koobera.math.uic.edu>,
D. J. Bernstein <djb@koobera.math.uic.edu> wrote:
>hash127 computes a 16-byte secret-key signature of an arbitrarily long
>message. An attacker cannot guess a signature of a different message
>with probability better than b/2^{131}, where b is the total number of
>bits in the messages and signatures.

Care to explain anything about the algorithm?!


Subject: Re: Guaranteed message authentication faster than MD5
Date: 9 Apr 1999 04:17:13 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr904.17.13.12567@koobera.math.uic.edu>
References: <phrF9uxsE.KuA@netcom.com>
Newsgroups: sci.crypt
Lines: 29

Paul Rubin <phr@netcom.com> wrote:
> Care to explain anything about the algorithm?!

What the code does is compute a polynomial

   s = (r^{n+1} + m[0] r^n + m[1] r^{n-1} + ... + m[n-1] r + x) mod p

at very high speed, given 32-bit chunks m[0],m[1],...,m[n-1] and big
secret integers r,x. Here p is a public prime, namely 2^127-1. See
http://pobox.com/~djb/papers/hash127.dvi for some computational details.

If you have just one message m[0],m[1],...,m[n-1] to sign, you can use s
directly as a signature of the message. It's easy to prove that an
attacker, even with infinite computer power, has a negligible chance of
correctly guessing a signature of a different message.

If you have many messages to sign, you can either

   * use the same r for each message, always use x = 0, and feed s
     through a secret 128-bit function before sending it; or

   * use the same r for each message, and generate a new x for each
     message by feeding a nonce through a secret function.

Both methods are provably safe against an opponent who can't predict the
secret function. The first method doesn't need nonces; the second method
has lower latency in some applications.

---Dan


Subject: Re: Guaranteed message authentication faster than MD5
Date: Sat, 10 Apr 1999 14:25:30 GMT
From: Scott Fluhrer <sfluhrer@ix.netcom.com>
Message-ID: <7enmg9$44h@dfw-ixnews10.ix.netcom.com>
References: <1999Apr904.17.13.12567@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 29

In article <1999Apr904.17.13.12567@koobera.math.uic.edu>,
	djb@koobera.math.uic.edu (D. J. Bernstein) wrote:

>Paul Rubin <phr@netcom.com> wrote:
>> Care to explain anything about the algorithm?!
>
>What the code does is compute a polynomial
>
>   s = (r^{n+1} + m[0] r^n + m[1] r^{n-1} + ... + m[n-1] r + x) mod p
>
>at very high speed, given 32-bit chunks m[0],m[1],...,m[n-1] and big
>secret integers r,x. Here p is a public prime, namely 2^127-1. See
>http://pobox.com/~djb/papers/hash127.dvi for some computational details.
>
>If you have just one message m[0],m[1],...,m[n-1] to sign, you can use s
>directly as a signature of the message. It's easy to prove that an
>attacker, even with infinite computer power, has a negligible chance of
>correctly guessing a signature of a different message.

The most obvious weakness in this is that anyone who has access to r and x
can easily create collisions (that is, two different messages with the
same signature).  This reduces where this algorithm can be used to ones
where both the sender and the receiver are trusted.  For example, this
cannot be used as part of a signature scheme.


-- 
poncho
 


Subject: Re: Guaranteed message authentication faster than MD5
Date: 10 Apr 1999 22:36:11 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr1022.36.11.22467@koobera.math.uic.edu>
References: <7enmg9$44h@dfw-ixnews10.ix.netcom.com>
Newsgroups: sci.crypt
Lines: 32

Secret-key signatures (also known as ``message authentication codes'')
are much faster than public-key signatures. That's why they're used in
ssh and many other systems. There's no need for the public to be able to
check the signatures on ssh packets.

Today most people use secret-key signatures based on MD5 or SHA, such as
HMAC-MD5:

   The signature of m under secret key k is MD5(k,MD5(k+blah,m)).

hash127 can be used for the same applications. hash127 is faster than
HMAC-MD5 for messages of any size, and the underlying function includes
a mathematical guarantee that it will never need to be replaced.

Scott Fluhrer  <sfluhrer@ix.netcom.com> wrote:
> This reduces where this algorithm can be used to ones
> where both the sender and the receiver are trusted.

Actually, secret-key signatures can be used for untrusted communication.
For example, one natural structure for signed person-to-person email is

   g^a, g^b, t, s, message

where g^a is the sender's public key, g^b is the receiver's public key,
t is a nonce chosen by the sender, and s is a signature of the message
under a shared secret key derived from g^a, g^b, and t.

The security properties here are just like face-to-face communication:
the receiver knows that the message is authentic, but he can't prove it
to someone else later.

---Dan


Subject: Re: Guaranteed message authentication faster than MD5
Date: Wed, 14 Apr 1999 18:16:20 GMT
From: pierrepuzer@hotmail.com
Message-ID: <7f2m1b$lm7$1@nnrp1.dejanews.com>
References: <1999Apr623.34.22.28665@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 12

D. J. Bernstein wrote:
> hash127 is the first ``universal hash function'' at this security level
> to break the MD5 speed barrier.

Is it possible to use hash127 or ideas from it to construct a provably
unpredictable random function?

How, exactly, do you prove the safeness of hash127?
I read the abstract but it has no proof.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    


Subject: Re: Guaranteed message authentication faster than MD5
Date: 14 Apr 1999 23:09:39 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr1423.09.39.15909@koobera.math.uic.edu>
References: <7f2m1b$lm7$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 50

> How, exactly, do you prove the safeness of hash127?

Pick any two distinct messages, m and m', and (not necessarily distinct)
integers s and s' between 0 and p-1, where p = 2^127-1.

Say (r,x) is a uniform random 256-bit secret key. The chance of s being
a signature of m is at least 1/2^127. The chance of s being a signature
of m _and_ s' being a signature of m' is at most (3 max{n,n'} + 6)/2^255
where n and n' are the lengths of m and m' respectively.

Thus the conditional probability of s' being a signature of m', given
that s is a signature of m, is at most (3 max{n,n'} + 6)/2^128. It
doesn't matter how much time the attacker spends after he sees m and s;
any guess he comes up with for m' and s' will almost certainly be wrong.

Proof that s is a signature of m with probability at least 2^129/2^256:
There are 2^128 choices of r. For each choice of r, there are at least
two choices of x satisfying

   s = (r^(n+1) + r^n m[0] + r^(n-1) m[1] + ... + r m[n-1] + x) mod p.

So there are at least 2^129 choices of (r,x).

Proof that s is a signature of m and s' is a signature of m' with
probability at most (6 max{n,n'} + 12)/2^256: If

   s = (r^(n+1) + r^n m[0] + r^(n-1) m[1] + ... + r m[n-1] + x) mod p and
   s' = (r^(n'+1) + r^n' m'[0] + r^(n'-1) m'[1] + ... + r m'[n'-1] + x) mod p

then r is a root mod p of the polynomial

   t^(n+1) + ... + t m[n-1] - s - t^(n'+1) - ... - t m'[n'-1] - s'.

That polynomial is nonzero since m and m' are different, so it has at
most max{n,n'} + 1 roots mod p, and therefore at most 2 max{n,n'} + 2
roots between 0 and 2p-1, as well as possibly 2p and 2p+1. For each of
those 2 max{n,n'} + 4 choices of r there are at most 3 choices of x.

> Is it possible to use hash127 or ideas from it to construct a provably
> unpredictable random function?

Nobody knows how to prove that one can efficiently stretch (say) 256
uniform random bits into (say) a string of 512 unpredictable random
bits. This seems substantially more difficult than proving P<NP.

What's interesting about authentication is that you only need to
generate a few random bits for each message, no matter how long the
messages are.

---Dan


Subject: Re: Guaranteed message authentication faster than MD5
Date: Thu, 15 Apr 1999 10:01:52 GMT
From: puzer@my-dejanews.com
Message-ID: <7f4dec$4ut$1@nnrp1.dejanews.com>
References: <1999Apr1423.09.39.15909@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 20


> Nobody knows how to prove that one can efficiently stretch (say) 256
> uniform random bits into (say) a string of 512 unpredictable random
> bits.

However, is it possible to construct a random function from hash127 that
is unpredictable enough in practice although one cannot prove it?

SURF probably does the job but a single polynomial would look nicer than
a generalized Feistel sequence.

> Secret-key signatures are used in ssh and many other systems. There's no
> need for the public to be able to check the signatures on ssh packets.

Why not make an ssh replacement using hash127, Snuffle/SURF or SEOC,
ucspi-tcp, ptyget and some key-exhange software? Is key-exhange the
only missing piece?

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    


Subject: Extreme lossy text compression
Date: Fri, 16 Apr 1999 00:20:22 GMT
From: gteabo@my-dejanews.com
Message-ID: <7f5vnu$hj1$1@nnrp1.dejanews.com>
References: <1999Apr1423.09.39.15909@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 28

I have an idea to reduce redundancy in a large data store.  When a plaintext
is submitted to storage, I want to be able to easily determine whether that
exact plaintext already exists in my storage.

Briefly, my idea is to process some plaintext (which might be from 1KB to 10KB
in length), and create, say, a 256 byte "key" that is derived from that
plaintext that will be unique with a high degree of certainty.

It would be kind of like an extreme "lossy" compression scheme.   We'd be
taking up to 10000 bytes of plaintext and creating a 256 byte "key" useful for
matching identical plain texts.

Then when a new plaintext is submitted, I can generate my 256 byte "key" and
quickly search my database to determine if that 256 byte "key" already exists.

Unlike most things discussed in sci.crypt:
1. I don't need to be able to recreate the plaintext from the key.
2. I don't care about security, so headers and things are perfectly fine.

Let's assume that I won't be able to compare the original plaintexts to verify
that they are actually identical.  I need a high degree of confidence based
upon the keys alone.

Has anyone ever heard or seen anything like this before?


-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    


Subject: Re: Extreme lossy text compression
Date: Thu, 15 Apr 1999 19:11:05 -0600
From: John Myre <jmyre@sandia.gov>
Message-ID: <37168E29.3A73F371@sandia.gov>
References: <7f5vnu$hj1$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 64


(see original question below)

Any good hash (e.g. SHA-1) ought to work.  The output is even
shorter (20 bytes in the case of SHA-1) than requested, and so
more efficient.  The theory is that a good hash seems like a
"random function" of the input and so the likelyhood of a false
match (a "collision") is the same as two random values of the
hash length being the same.  For SHA-1 this means a 1 in 2^160
chance of error (*really* small).

Intuitively, I think of it this way: we say a hash function is
cryptographically secure when we don't think an adversary could
generate a false match on purpose even with extraordinary amounts
of compute power.  If that is so, then an accidental false match
must also be extremely unlikely, or else the adversary could
generate a false match merely by trying random inputs.

The way this is different from more ordinary check value functions
(CRC's and the like) is that we (essentially) remove the chance
that a few simple changes to a document would result in the
same check value.  An ordinary check value function has some
structure which an adversary could use to figure out how to
get a false match; given such a structure a "few" changes might
work, and a "few" changes can happen by accident.

If a 20 byte hash doesn't seem long enough, you could concatenate
two or more different hashes (e.g., SHA-1 plus MD5, or a hash of
the document plus a hash of the document modified in some simple
way, like adding a blank at the end).

(Of course, I am taking for granted that in fact you want even a
tiny (e.g., 1 byte) change to show up as a difference; that you
don't care if you store umpty-ump versions of the same document.)

gteabo@my-dejanews.com wrote:
> 
> I have an idea to reduce redundancy in a large data store.  When a plaintext
> is submitted to storage, I want to be able to easily determine whether that
> exact plaintext already exists in my storage.
> 
> Briefly, my idea is to process some plaintext (which might be from 1KB to 10KB
> in length), and create, say, a 256 byte "key" that is derived from that
> plaintext that will be unique with a high degree of certainty.
> 
> It would be kind of like an extreme "lossy" compression scheme.   We'd be
> taking up to 10000 bytes of plaintext and creating a 256 byte "key" useful for
> matching identical plain texts.
> 
> Then when a new plaintext is submitted, I can generate my 256 byte "key" and
> quickly search my database to determine if that 256 byte "key" already exists.
> 
> Unlike most things discussed in sci.crypt:
> 1. I don't need to be able to recreate the plaintext from the key.
> 2. I don't care about security, so headers and things are perfectly fine.
> 
> Let's assume that I won't be able to compare the original plaintexts to verify
> that they are actually identical.  I need a high degree of confidence based
> upon the keys alone.
> 
> Has anyone ever heard or seen anything like this before?
> 
> -----------== Posted via Deja News, The Discussion Network ==----------
> http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own


Subject: Re: Extreme lossy text compression
Date: Fri, 16 Apr 1999 18:12:45 GMT
From: gteabo@my-dejanews.com
Message-ID: <7f7uil$8e8$1@nnrp1.dejanews.com>
References: <37168E29.3A73F371@sandia.gov>
Newsgroups: sci.crypt
Lines: 68



John,

I am very impressed and grateful for your reply.

I think I understand what you mean by a "good hash" based upon your
description of its characteristics.  You mentioned two flavors:  SHA-1 and
MD5.  While I have seen C source code posted for SHA-1 by an individual, I
wondered whether there is any authoritative site where proven source can be
found?	Is there a web site that is considered the "Hash Authority".

What about licensing these hashes?  Who owns them?  Is it considered
freeware, shareware, or what?  Or is it more just a programming structure like
a "nested loop" which isn't really owned by anybody?

To answer your question, in actual fact, I DO want to count even a 1 byte
change as a change, so a good hash sounds BETTER than perfect for my needs.

I wonder whether sufficient randomization can be achieved in just 20 bytes
considering the non-random characteristic of my inputted plaintext. 
Generally the plaintext is going to be ordinary English language text (like
that found in Usenet or in ordinary Email), so there will be a high
concentration of the 26 letters A-Z upper and lower case, plus spaces and
some punctuation.  Can a "good hash" still output sufficiently random results
from that input?  Do hash developers run any statistical tests to back up
their theory?

Thanks,
Geoff

In article <37168E29.3A73F371@sandia.gov>,
  John Myre <jmyre@sandia.gov> wrote:
>
> Any good hash (e.g. SHA-1) ought to work.  The output is even
> shorter (20 bytes in the case of SHA-1) than requested, and so
> more efficient.  The theory is that a good hash seems like a
> "random function" of the input and so the likelyhood of a false
> match (a "collision") is the same as two random values of the
> hash length being the same.
>
> I am taking for granted that in fact you want even a
> tiny (e.g., 1 byte) change to show up as a difference; that you
> don't care if you store umpty-ump versions of the same document.)
>
> gteabo@my-dejanews.com wrote:
> >
> > I have an idea to reduce redundancy in a large data store.  When a plaintext
> > is submitted to storage, I want to be able to easily determine whether that
> > exact plaintext already exists in my storage.
> >
> > It would be kind of like an extreme "lossy" compression scheme.   We'd be
> > taking up to 10000 bytes of plaintext and creating a 256 byte "key" for
> > matching identical plaintexts.
> >
> > Then when a new plaintext is submitted, I can generate my 256 byte "key" and
> > quickly search my database to determine if that 256 byte "key" exists.
> >
> > Unlike most things discussed in sci.crypt:
> > 1. I don't need to be able to recreate the plaintext from the key.
> > 2. I don't care about security, so headers and things are perfectly fine.
> >
> > Assume I won't be able to compare the original plaintexts to verify
> > that they are actually identical.  I need a high degree of confidence based
> > upon the keys alone.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    


Subject: Re: Extreme lossy text compression
Date: Fri, 16 Apr 1999 22:43:57 -0400
From: Geoff Thorpe <geoff@raas.co.nz>
Message-ID: <3717F56D.2B1DA355@raas.co.nz>
References: <7f7uil$8e8$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 133

Hi there,

> I think I understand what you mean by a "good hash" based upon your
> description of its characteristics.  You mentioned two flavors:  SHA-1 and
> MD5.  While I have seen C source code posted for SHA-1 by an individual, I
> wondered whether there is any authoritative site where proven source can be
> found?  Is there a web site that is considered the "Hash Authority".

Loads of implementations are available in loads of places. I use
cryptlib (http://www.cs.auckland.ac.nz/~pgut001/cryptlib/) and mostly
through my Java wrapper for it (http://www.hoopal.com/geoff/jniCryptlib)
... if ALL you need is a hash function this would be somewhat like
swatting flies with a large bus. I've just checked and cryptlib is using
Eric Young's code (from SSLeay) for the hashing - that's under a GNU
license I think ... anyway, you can probably just suck that code out
(for SHA1) and use it, bearing in mind you can't sell or redistribute
the result without taking a close look at the license. (BTW: For "proven
source" - The various FIPS standards demostrate test vectors to check
your implementation against, Peter has included these in cryptlib for
start-up self-testing and you could probably consult his code to see the
test vectors).

> What about licensing these hashes?  Who owns them?  Is it considered
> freeware, shareware, or what?  Or is it more just a programming structure like
> a "nested loop" which isn't really owned by anybody?

Not sure about the licensing - but for sure it's not just a "nested
loop" because this is more than a hash-function we're talking, it's a
SECURE hash function which may be more than you actually need.

> To answer your question, in actual fact, I DO want to count even a 1 byte
> change as a change, so a good hash sounds BETTER than perfect for my needs.

Yeah - but do you need a secure hash? The strength(s) of SHA1 (and MD5
and the other crypto-hash functions) is not just that they provide a
good "ID" of a given piece of data (with your required property of any
change in the input generating, with all probability, a corresponding
change in output), but also that they are collision-resistant (as John
said) ... that is, a hacker has a very tough job to find a piece of
input data that would generate a chosen hash-value ... if you're just
looking for a good index into your data-store, this is not an issue ...
because you're not going to be deliberately trying to create texts that
ARE different to anything there but have a matching hash-value just to
force your application run an extra lookup to check.

Maybe all you need is to use a standard hash function (or even cut your
own if you feel daring) - lots of XORs and you're away [;-)

Basically a standard hash (should) give you;
--------------------------------------------
- If A and B are the same data, their hashes match (ie it's totally
deterministic - the output is a function of the data only, no external
or random information is used in the function).
- If A and B are different (and independant/unrelated), then the
probability their hashes match is pretty close to one in "m" (where "m"
is the number of possible hash values, hopefully this would be equal to
2^n where n is the number of bits in the hash-value). This assumes
however that A and B are independant - in most cases this probability
REALLY isn't that microscopic if somebody is trying to intentionally
construct a B for the purpose of matching A's output.

If your texts are likely to be similar quite often - you might want to
deterministically "munge" your input before applying a "standard" hash
... in other words choose a method for mashing up the contents to spread
any differences in the text around ... then compression is always handy
for hiding redundancy.

A "secure hash" gives you;
--------------------------
- the same as a standard hash function, EXCEPT;
- even if somebody tries to construct a B for the purpose of matching
A's output, they will (hopefully) fail to improve the probability much
beyond 1 in "m" - this is why secure hashes are used in cryptographic
"signature" techniques. The collision probabilty stays astronomically
small even when you're TRYING to get them to collide.

So a secure hash will solve your problems and by design, shouldn't
require you to take the additional steps I suggested (if you had to then
by definition, it's vulnerable to the problems those steps try to
avoid). You just have to decide on convenience and/or performance. In C,
on most platforms (OS and hardware), it's possible to get some pretty
quick hashes going and from my experience; in network systems,
particularly involving large databases, the time taken in a secure hash
will be the last of your worries [;-)

> I wonder whether sufficient randomization can be achieved in just 20 bytes
> considering the non-random characteristic of my inputted plaintext.
> Generally the plaintext is going to be ordinary English language text (like
> that found in Usenet or in ordinary Email), so there will be a high
> concentration of the 26 letters A-Z upper and lower case, plus spaces and
> some punctuation.  Can a "good hash" still output sufficiently random results
> from that input?  Do hash developers run any statistical tests to back up
> their theory?

See my above comment - making any change in the output data (be it
appending something to it, transforming it all in some rational way, or
just changing one byte a little bit) should make radical changes in the
output of a secure hash function (and probably most good regular hash
functions). Compressing data is a particularly good idea on language
text (and should be considered when storing masses of it - especially if
it's for archiving and not really high-load processing), but is not
necessary at all in a secure hash. However, it may be handy for
performance properties. English language text usually compresses a lot
with typical algorithms (ZIP, etc) - you might want to try benchmarking
the speed trade-offs on using or not using compression first (the
compression takes a little time, but reduces the time required to hash
because you're hashing less data).

So a secure hash (unless it's been broken) is as good as the size of the
hash-values it maps data to. And 20 bytes can hold a lot of different
hash values. If the number of records in your database tables got close
to even a 5-byte number I would be staggered. Some quick (and probably
way-off) probability calculations; If you had 256^5 (maximum 5-byte
number - just over a thousand billion) records in your database, which
is a little optimistic - the probability that ANY two records had the
same SHA1 hash would be less than 1 in 256^10 (that last huge number
squared). Some realists would say you don't even have to run a check if
two texts have matching hash-values - the chance that they do but aren't
the same text is ... well ... unlikely (to grossly understate things).

Another comment (although you probably don't want any more); if each
record is storing a 20-byte hash in a seperate field (as a fixed-length
ASCII field - 27 chars using BASE64, or 40 chars using
hex-representation), you can optimize your database querying
dramatically either by using it as your primary key (if you're prepared
to take that astronomically small chance of a collision), or by putting
an index on it (again, if you're prepared to take the non-risk then you
could put a UNIQUE constraint index on it). The hash will provide a good
serial number for each record, and with the index it would speed up your
checks for duplicates.

Cheers,
Geoff


Subject: Re: Extreme lossy text compression
Date: Sat, 17 Apr 1999 19:51:31 GMT
From: Geoffrey Teabo <gteabo@my-dejanews.com>
Message-ID: <7faoo0$iiv$1@nnrp1.dejanews.com>
References: <3717F56D.2B1DA355@raas.co.nz>
Newsgroups: sci.crypt
Lines: 27



Geoff,

Thank you for your TIME to write such a lengthy and VERY CLEAR post.

> Maybe all you need is to use a standard hash function <

If SHA1 and MD5 are well known secure hashes, what's a well known standard
hash?

> So a secure hash will solve your problems and by design, shouldn't
> require you to take the additional steps I suggested

By "additional steps" you mean the pre-hash "munge" you suggested for a
standard hash, right?   Maybe a secure hash is better for me just for its
"munge"-free quality!

FINALLY: I just want to re-ask one other question, which no one responded
to... Do hash developers run any statistical TESTS to back up their theory of
random output, particularly for secure hashes?

Thanks, again,
Geoff Teabo

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    


Subject: Re: Extreme lossy text compression
Date: Sun, 18 Apr 1999 14:26:23 -0400
From: Geoff Thorpe <geoff@raas.co.nz>
Message-ID: <371A23CF.650ACC8B@raas.co.nz>
References: <7faoo0$iiv$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 79

Hi there,

> Thank you for your TIME to write such a lengthy and VERY CLEAR post.

My pleasure - I've had to play with essentially the same issues you have
a few times so if I can save you a little time as a result, great.
  
> If SHA1 and MD5 are well known secure hashes, what's a well known standard
> hash?

Well here you've got me a little - I'm a mathematician interested in
crypto who happens to make his $$ undercover as a programmer, not a
computer scientist well versed in all Knuth's volumes of essential
reading [;-) However, some others have mentioned a couple in these
threads - the Universal Hash Function, and somebody also mentioned a
simple (& fast) cute example involving XOR-ing in the blocks of the
message one at a time (after seeding it with some constant key-value and
padding incomplete blocks with zeroes I think?). This could suffice,
although if you expect a LOT of your texts to be very similar but not
quite the same this method would probably have a statistically better
chance of collisions - just a hunch though, I'm not going to even try to
justify that feeling though.

> By "additional steps" you mean the pre-hash "munge" you suggested for a
> standard hash, right?   Maybe a secure hash is better for me just for its
> "munge"-free quality!

The munge would just be an idea to avoid what I was mentioning just
before about my "suspected" weakness of the XOR-based hasher ... if the
differences in many texts are isloted, possibly positioned on
unfortunate block boundaries or some such thing, then maybe some of the
hash values will come out similar - and worse, the same. A munge step
would just be an attempt to make sure that any differences are given
their best chance of influencing the hash value (or more precisely,
influencing all of the hash value). A paranoid consideration, that's
all.

> FINALLY: I just want to re-ask one other question, which no one responded
> to... Do hash developers run any statistical TESTS to back up their theory of
> random output, particularly for secure hashes?

Absolutely ... and even if they don't, there's a veritable army of
cryptanalysts out there just waiting to puncture even the smallest hole
in a "secure" hash algorithm (or any other class of crypto algorithm for
that matter). If someone even found a highly contrived way to choose
desirable texts that would cause the output hash values to exhibit ANY
kind of pattern, other than random noise, then that hash algorithm would
be discarded as a "cryptographic hash algorithm" and would turn
overnight into a "standard hash" (and a slow one moreover).

Many PRNG (psuedo-random number generators) take "random-info" from
hardware, eg. hard-drive timings, kernel stats, mouse activity, blah
blah - but although that info may be random, it may also be predictable
structured meaning that if you just stick the numbers end on end, and
XOR-hash the result you would have a VERY weak PRNG by cryptographic
standards. What the PRNG's usually (or should) do is just keep throwing
such "random-info" into a secure hash function - so as you can see, by
design these hashes need to have the property that subtle changes in
data, or data that is structurally similar every time, needs to still
generate seemingly unpredictable output unless the data matches exactly.

SHA1 and MD5 are dedicated secure hash algorithms, but there are others
that are derived from symmetric ciphers - that gives you an indication
of how seriously these algorithms are taken. They not only want a good
statistical "spread" for input values - they want it to stay
statistically uniform in the face of a determined attack to bias the
outputs. In short, the answer to your question (for secure hashes) is
yes.

For standard hashes, I believe there is some basic theoretical results
to the effect that given random (unknown) data A, the output value has
equal probability of landing on all possible hash values etc, and
similar probabilistic results about changes in input causing (in all
probability) changes in output. I find that it's all very easy if you're
prepared to accept hand-waving arguments without any mathematical
details [;-)

Well, best of luck,
Geoff


Subject: Re: Extreme lossy text compression
Date: 16 Apr 1999 16:01:06 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr1616.01.06.26203@koobera.math.uic.edu>
References: <7f5vnu$hj1$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 17

> Briefly, my idea is to process some plaintext (which might be from 1KB to 10KB
> in length), and create, say, a 256 byte "key" that is derived from that
> plaintext that will be unique with a high degree of certainty.

Will the results be kept secret? If so, you can use a ``universal hash
function.'' Universal hash functions are fast and guaranteed secure. See
http://pobox.com/~djb/hash127.html for sample code.

Even if the results are public, can you keep one short secret? If so,
you can feed a universal hash function through a function such as
Rijndael. This is still fast, and it's secure if Rijndael is.

If nothing is secret, you'll have to resort to a ``collision-resistant
hash function'' such as SHA-1. This is not as fast as a universal hash
function.

---Dan


Subject: Re: Extreme lossy text compression
Date: Fri, 16 Apr 1999 18:58:31 GMT
From: gteabo@my-dejanews.com
Message-ID: <7f818h$b5n$1@nnrp1.dejanews.com>
References: <1999Apr1616.01.06.26203@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 66

Dan,

Thank you for your reply.

I'm not clear why secrecy is an issue.  Maybe you're using the term "secret"
in a technical sense that I don't understand.

My exact application involves archiving.  Imagine a program that would monitor
news articles coming over the wire from AP or Reuters.   I want to create a
hash so that I can quickly search a database to determine whether that
particular IDENTICAL newsarticle was ever received previously.

So the whole system is secret, in a sense.  The hash results will be created
and stored on a backed closed system without any user interface.  Keeping "one
short secret" isn't a problem.

Collision avoidance is my TOP priority.  Security isn't.  After all, the
writers at AP aren't exactly composing their news stories to try to hack my
system!  They're just reporting the news.

The number of articles that might be held in this system could be as many as
100,000 per year at first, growing to 200,000 per year within 5 years.	Even
if we purge every 20 years, that pace of growth will create a 1,000,000
article database within the likely life of this system.

I need to know that with 1,000,000 different plaintexts in storage, that the
next new unique article that comes down the pike has the near impossibility of
creating hash identical to a NON-IDENTICAL article that was stored previously.

The probability of two plaintext articles creating the same hash is what I
need to minimize.

It would also be nice if the hash were fast, of course.

Is there some website or resource that you regard as the "Hash Authority"
where I could learn about the different things you mention in your Email.  I
need to learn a little about all this, and I don't want to BUG you or the
newsgroup for every last detail.

Thanks for your ideas, and the link to your sample code.  I'll check it out!

Geoff


In article <1999Apr1616.01.06.26203@koobera.math.uic.edu>,
djb@koobera.math.uic.edu (D. J. Bernstein) wrote:
>
> Will the results be kept secret? If so, you can use a ``universal hash
> function.'' Universal hash functions are fast and guaranteed secure. See
> http://pobox.com/~djb/hash127.html for sample code.
>
> Even if the results are public, can you keep one short secret? If so,
> you can feed a universal hash function through a function such as
> Rijndael. This is still fast, and it's secure if Rijndael is.
>
> If nothing is secret, you'll have to resort to a ``collision-resistant
> hash function'' such as SHA-1. This is not as fast as a universal hash
> function.
>
Geoffrey Teabo wrote....
> > Briefly, my idea is to process plaintext (from 1KB to 10KB
> > in length), and create, say, a 256 byte "key" that is derived from that
> > plaintext that will be unique with a high degree of certainty.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    


Subject: Re: Extreme lossy text compression
Date: Fri, 16 Apr 1999 22:25:00 +0200
From: Terje Mathisen <Terje.Mathisen@hda.hydro.com>
Message-ID: <37179C9C.569E@hda.hydro.com>
References: <7f818h$b5n$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 90

gteabo@my-dejanews.com wrote:
> 
> Dan,
> 
> Thank you for your reply.
> 
> I'm not clear why secrecy is an issue.  Maybe you're using the term "secret"
> in a technical sense that I don't understand.
> 
> My exact application involves archiving.  Imagine a program that would monitor
> news articles coming over the wire from AP or Reuters.   I want to create a
> hash so that I can quickly search a database to determine whether that
> particular IDENTICAL newsarticle was ever received previously.
> 
> So the whole system is secret, in a sense.  The hash results will be created
> and stored on a backed closed system without any user interface.  Keeping "one
> short secret" isn't a problem.
> 
> Collision avoidance is my TOP priority.  Security isn't.  After all, the
> writers at AP aren't exactly composing their news stories to try to hack my
> system!  They're just reporting the news.
> 
> The number of articles that might be held in this system could be as many as
> 100,000 per year at first, growing to 200,000 per year within 5 years.  Even
> if we purge every 20 years, that pace of growth will create a 1,000,000
> article database within the likely life of this system.
> 
> I need to know that with 1,000,000 different plaintexts in storage, that the
> next new unique article that comes down the pike has the near impossibility of
> creating hash identical to a NON-IDENTICAL article that was stored previously.
> 
> The probability of two plaintext articles creating the same hash is what I
> need to minimize.
> 
> It would also be nice if the hash were fast, of course.

OK, in this case you have a few options:

1) (The simplest:) Calculate the total number of articles your system
needs to handle over it's lifetime, and you can then determine the
minimum number of bits needed in a hash to keep the chance of any
collisions below an arbitrary limit.

BTW, I believe this is the 'Birthday problem', i.e. in a group of 24
people, there is a more than 50% chance of at least two people having
the same birthday.

I don't remember the approximate formula for this when working with big
numbers, but I'm willing to bet that with just 1E6 articles, a 128-bit
hash would be more than enough.

In fact, it was initially suggested that network MAC addresses (48 bits)
should be determined by random, since this would make the likelyhood of
any collisions low enough to be disregarded. (This suggestion was
vetoed, with the result that manufacturing snafu's have produced many
cards with unintentionally duplicated MACs.)

With a 64-bit hash, you have approximately 1.6E19 different hash values,
so the likelyhood of a collision (assuming uniform random distribution
of hash values) should be 

 1 - ((1.6E19-1) / 1.6E19 * (1.6E19-2) / 1.6E19 * ... *
(1.6E19-999999)/1.6E19)

As a first (worst-case) approximation, I'll take the smallest value of
those ratios

 (1.6E19-1E7) / 1.6E19 = 1 - 6.22E14

  exp(ln((1.6E19-1E7)/1.6E19)*1E7) = 0.9999937494639061

So, the final value would be

  1 - 0.9999937494639061 = 6.25E-6

which means that since the true value will be quite a bit smaller, your
real odds against ever seeing a collision should be small enough to be
disregarded.

Back to your original question, which hash algorithm to select? Any of
the well-known message digest functions would work, or you could use DES
or some other encryption algorithm, xor'ing 64-bit result blocks
together.

Terje

-- 
- <Terje.Mathisen@hda.hydro.com>
Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"


Subject: Re: Extreme lossy text compression
Date: 17 Apr 1999 06:32:54 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr1706.32.54.29838@koobera.math.uic.edu>
References: <7f818h$b5n$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 30

> The number of articles that might be held in this system could be as many as
> 100,000 per year at first, growing to 200,000 per year within 5 years.

Let's say you receive one billion different articles, each up to a
megabyte long. You then choose a 38-digit number r and feed each
article, along with your number r, to hash127.

There's at least a 99.9999999999995% chance that the results will all be
different: for any possible set of articles, at most 0.0000000000005% of
the r's will produce a collision.

> It would also be nice if the hash were fast, of course.

hash127 on my old Pentium-133 runs at about 238 million bits per second
from L2 cache; e.g., it hashes 40000 bytes in about 1.4 milliseconds.

> I'm not clear why secrecy is an issue.

If you reveal the hash for more than one article then a criminal can
create new articles with any desired hash. Even if you don't think this
is a problem now, there's no reason for you to allow it. You can, for
example, feed the hash127 result through Rijndael_k with a secret key k.
This is very fast---the hash127 result is only 16 bytes long.

> Is there some website or resource that you regard as the "Hash Authority"
> where I could learn about the different things you mention in your Email.

No. The literature on universal hash functions is a mess.

---Dan


Subject: Re: Extreme lossy text compression
Date: Sat, 17 Apr 1999 19:36:34 GMT
From: Geoffrey Teabo <gteabo@my-dejanews.com>
Message-ID: <7fans2$hot$1@nnrp1.dejanews.com>
References: <1999Apr1706.32.54.29838@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 46



Dan,

Criminals are just soooo NOT an issue for this application.

Suppose I'm monitoring stories coming in off a newswire service (like
Reuters), I'm just trying to make sure that I don't capture the identical
news story TWICE in my database.

A so-called attacker, or criminal, in this case would be trying to compose a
news article to intentionally collide with an old news article, and supposedly
to what purpose?  The only effect would be that my system would IGNORE the new
phony story anyway!!!!!  How ironic!

So clearly my ONLY concern is that a new VALID and REAL news story would
appear with the same hash result as the hash result of ANOTHER DIFFERENT and
OLDER news story.  That would be really BAD because I'd be ignoring a VALID
story.

In light of my specific issues, another post to this thread suggested that a
128 bit CRC would be okay, and that a hash is more trouble than it's worth.

What do you think?

I have a hunch that a CRC isn't as good because it's not as random, and I
might have to worry that similar plaintexts like "BBBBBBBB" and "BBBBBBDA"
might have the same CRC.

Geoff

===
Dan wrote:
>
> If you reveal the hash for more than one article then a criminal can
> create new articles with any desired hash. Even if you don't think this
> is a problem now, there's no reason for you to allow it. You can, for
> example, feed the hash127 result through Rijndael_k with a secret key k.
> This is very fast---the hash127 result is only 16 bytes long.
>
===
Geoff wrote:
> > I'm not clear why secrecy is an issue.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    


Subject: Re: Extreme lossy text compression
Date: 17 Apr 1999 15:20:49 -0600
From: vjs@calcite.rhyolite.com (Vernon Schryver)
Message-ID: <7fatvh$1k9$1@calcite.rhyolite.com>
References: <7fans2$hot$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 73

In article <7fans2$hot$1@nnrp1.dejanews.com>,
Geoffrey Teabo  <gteabo@my-dejanews.com> wrote:

> ...
>Suppose I'm monitoring stories coming in off a newswire service (like
>Reuters), I'm just trying to make sure that I don't capture the identical
>news story TWICE in my database.
>
>A so-called attacker, or criminal, in this case would be trying to compose a
>news article to intentionally collide with an old news article, and supposedly
>to what purpose?  The only effect would be that my system would IGNORE the new
>phony story anyway!!!!!  How ironic!
>
>So clearly my ONLY concern is that a new VALID and REAL news story would
>appear with the same hash result as the hash result of ANOTHER DIFFERENT and
>OLDER news story.  That would be really BAD because I'd be ignoring a VALID
>story.

In that case, why not do the standard, ancient, obvious thing?  Pick a
fast, reasonable hash function, and keep an online database of all messages
keyed by their hash values.  Then as each message arrives, compute its
hash value, and look up the hash value in your database.  If there is a
record with that hash value, then compare the new message with the record.
If appropriate, let the comparison be "fuzzy' (and if so, remember to make
the hash function use the fuzzy value of the message).  If the new message
matches the existing record, you have a duplicate, so do whatever is
appropriate.  Use whatever database is reasonable give the number of
messages you need to deal with.  Given modern disks, you can easily keep
everything that a newswire carries on line for long enough.

That worked great for a spam filter on a large corporate firewall.  The
system received about 200,000 email messages/day, of which between 10,000
and 14,000 were copies of 50 to 200 different streams of spam, some with
the usual spammer customizing such as "Dear sucker" or random junk
prepended or appended.  Every email message passing through the system
was hashed and checked against an automatically maintained "database" of
previously seen spam.  The hash function I chose was a simple, 32-bit
byte-sum of the alphabetic characters in the message, excluding whitespace,
numbers, punctuation, etc.  In practice, the hash function had a collision
rate of much better than 10e-6 messages.  The "database" was a single UNIX
field directory, with the "key" consisting of the hash value used as a
file name containing the message.  Hash collisions were handled by
appending a "-%d" string to the name.


>In light of my specific issues, another post to this thread suggested that a
>128 bit CRC would be okay, and that a hash is more trouble than it's worth.
>
>What do you think?

I think it's sad that people are unwilling to think about hashes.  There
is as much snake oil for hash functions as for encryption.  An early
response in this thread was typical of the common, silly notion that
'secure' hash functions are somehow less subject to hash collisions than
hash functions.  That secure hash functions are intended to be hard to
invert does not let them magically map domains of 2^10000 or more messages
1-to-1 onto ranges of 2^128 hash values.

>I have a hunch that a CRC isn't as good because it's not as random, and I
>might have to worry that similar plaintexts like "BBBBBBBB" and "BBBBBBDA"
>might have the same CRC.

On the contrary, CRC-32 and other error detecting and correcting hash
functions are expressly designed to detect such small changes.  There are
good reasons why your disk drive as well as your network links do not use
SHA or MD5 to detect bit rot.  If you care more about detecting small
changes than making it hard for bad guys to compute naughty messages that
have the same hash value as a good message, then your only rational choice
is one of the many error detecting hash functions.
-- 


Vernon Schryver    vjs@rhyolite.com


Subject: Re: Extreme lossy text compression
Date: 17 Apr 1999 23:26:16 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr1723.26.16.3111@koobera.math.uic.edu>
References: <7fans2$hot$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 24

Geoffrey Teabo  <gteabo@my-dejanews.com> wrote:
> I have a hunch that a CRC isn't as good because it's not as random,

Actually, a 128-bit CRC _with a random polynomial_ guarantees an
extremely low collision probability for any fixed set of inputs.

(Just like hash127, except that hash127 is faster and provides a
slightly better bound on the collision probability.)

> A so-called attacker, or criminal, in this case would be trying to compose a
> news article to intentionally collide with an old news article, and supposedly
> to what purpose?

What if the criminal spots a legitimate article before you do, and
quickly feeds you a fake article with the same hash, preventing the
legitimate article from entering your database?

``We're in sci.crypt. We're supposed to be paranoid.''

An attacker can easily do this if he knows your CRC polynomial; and he
can easily figure out the polynomial given a few CRC results. That's why
secrecy is an issue.

---Dan


Subject: Re: Extreme lossy text compression
Date: 17 Apr 1999 19:24:38 -0700
From: daw@blowfish.isaac.cs.berkeley.edu (David Wagner)
Message-ID: <7fbfp6$hi9$1@blowfish.isaac.cs.berkeley.edu>
References: <1999Apr1723.26.16.3111@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 7

In article <1999Apr1723.26.16.3111@koobera.math.uic.edu>,
D. J. Bernstein <djb@koobera.math.uic.edu> wrote:
> Actually, a 128-bit CRC _with a random polynomial_ guarantees an
> extremely low collision probability for any fixed set of inputs.

I'm confused.  Isn't it true that CRC(x) = CRC(0||x), for
any polynomial?  ("||" represents concatenation of bits.)


Subject: Re: Extreme lossy text compression
Date: Sun, 18 Apr 1999 20:45:27 +0200
From: Terje Mathisen <Terje.Mathisen@hda.hydro.com>
Message-ID: <371A2847.FAE@hda.hydro.com>
References: <7fbfp6$hi9$1@blowfish.isaac.cs.berkeley.edu>
Newsgroups: sci.crypt
Lines: 21

David Wagner wrote:
> 
> In article <1999Apr1723.26.16.3111@koobera.math.uic.edu>,
> D. J. Bernstein <djb@koobera.math.uic.edu> wrote:
> > Actually, a 128-bit CRC _with a random polynomial_ guarantees an
> > extremely low collision probability for any fixed set of inputs.
> 
> I'm confused.  Isn't it true that CRC(x) = CRC(0||x), for
> any polynomial?  ("||" represents concatenation of bits.)

Only if your CRC function is initialized with zero!

That's why you often see CRC functions that start with (unsigned) (-1)
instead: This way all leading zero bits are significant.

Terje

-- 
- <Terje.Mathisen@hda.hydro.com>
Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"


Subject: Re: Extreme lossy text compression
Date: 19 Apr 1999 04:45:47 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr1904.45.47.8690@koobera.math.uic.edu>
References: <7fbfp6$hi9$1@blowfish.isaac.cs.berkeley.edu>
Newsgroups: sci.crypt
Lines: 35

David Wagner <daw@blowfish.isaac.cs.berkeley.edu> wrote:
> I'm confused.  Isn't it true that CRC(x) = CRC(0||x), for
> any polynomial?

Not if you initialize the CRC correctly. Given input bits m[1], m[2],
..., m[n-128], compute the polynomial

   x^n + m[1] x^(n-1) + m[2] x^(n-2) + ... + m[n-128] x^128

and divide by a secret irreducible degree-128 polynomial p. Then add 128
secret bits to the remainder.

The result is is what people call ``almost strongly universal,'' i.e.,
almost uniformly distributed on any pair of distinct inputs. It's a
secure single-message authenticator.

Common modifications:

   * Use a public value for the 128 bits added at the end: typically 0
     or all-1s. The result is ``almost xor universal,'' i.e., has an
     almost uniform output difference for any two distinct inputs.

   * Omit the factor x^128 from the polynomial. The result is ``almost
     universal,'' i.e., almost never collides on any two distinct
     inputs. This is good enough for many applications.

   * Omit the leading x^n term from the polynomial. This only works for
     fixed-length inputs. (It's a common mistake in the literature.)

   * Use a public value for p. This doesn't pose any risk if the input
     is chosen independently of p.

All of these variants, and many more, are called CRCs in practice.

---Dan


Subject: Re: Extreme lossy text compression
Date: Sat, 17 Apr 1999 12:33:02 GMT
From: Scott Fluhrer <sfluhrer@ix.netcom.com>
Message-ID: <7f9uhk$91k@sjx-ixn9.ix.netcom.com>
References: <7f5vnu$hj1$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 40

In article <7f5vnu$hj1$1@nnrp1.dejanews.com>,
	gteabo@my-dejanews.com wrote:

>I have an idea to reduce redundancy in a large data store.  When a plaintext
>is submitted to storage, I want to be able to easily determine whether that
>exact plaintext already exists in my storage.
>
>Briefly, my idea is to process some plaintext (which might be from 1KB to 10KB
>in length), and create, say, a 256 byte "key" that is derived from that
>plaintext that will be unique with a high degree of certainty.
>
>It would be kind of like an extreme "lossy" compression scheme.   We'd be
>taking up to 10000 bytes of plaintext and creating a 256 byte "key" useful for
>matching identical plain texts.
>
>Then when a new plaintext is submitted, I can generate my 256 byte "key" and
>quickly search my database to determine if that 256 byte "key" already exists.
>
>Unlike most things discussed in sci.crypt:
>1. I don't need to be able to recreate the plaintext from the key.
>2. I don't care about security, so headers and things are perfectly fine.
>
>Let's assume that I won't be able to compare the original plaintexts to verify
>that they are actually identical.  I need a high degree of confidence based
>upon the keys alone.
>
>Has anyone ever heard or seen anything like this before?

Several people have already given you responses that work quite nicely if you
have to worry about attackers trying to generate collisions.  If you really
don't care about that, then you might as well use a 128 bit CRC.  This can be
computed much faster than any known hash function, and provably will change on
small differences, and on random changes, it's maximally good (any 128 bit
result is equally likely, which is more than we know about most hash functions).
Its only drawback is that it's pretty trivial for an attacker to generate
collisions, but you said you didn't have to worry about that.

-- 
poncho
 


Subject: Re: Extreme lossy text compression
Date: Sat, 17 Apr 1999 19:22:24 GMT
From: Geoffrey Teabo <gteabo@my-dejanews.com>
Message-ID: <7fan19$h59$1@nnrp1.dejanews.com>
References: <7f9uhk$91k@sjx-ixn9.ix.netcom.com>
Newsgroups: sci.crypt
Lines: 36



Poncho...

That's right.  I really don't care about "attackers" at all.  I'm monitoring
an independent system that doesn't even know it's being monitored.

Imagine if my program were collecting news articles over the wire services.
I just want to be able to use a hash (or CRC) of each incoming plaintext to
search a historical "list" of past hashes (or CRCs) to determine whether that
particular IDENTICAL newsarticle was ever received previously.

As I pointed out in another post, news writers won't compose their news
stories to try to hack my system!  They're just reporting the news.  They
don't even know I have a system!!!

Can you point me toward source code for 128 bit CRC ???

Thanks a huge bunch.  I don't want overkill.


Scott Fluhrer <sfluhrer@ix.netcom.com> wrote:
>
> Several people have already given you responses that work quite nicely if you
> have to worry about attackers trying to generate collisions.  If you really
> don't care about that, then you might as well use a 128 bit CRC.  This can be
> computed much faster than any known hash function, and provably will change on
> small differences, and on random changes, it's maximally good (any 128 bit
> result is equally likely, which is more than we know about most hash
functions).
> Its only drawback is that it's pretty trivial for an attacker to generate
> collisions, but you said you didn't have to worry about that.
>

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    


Subject: Re: Extreme lossy text compression
Date: Sat, 17 Apr 1999 22:26:52 -0400
From: Boris Kazak <bkazak@worldnet.att.net>
Message-ID: <371942EC.19CB@worldnet.att.net>
References: <7fan19$h59$1@nnrp1.dejanews.com>
Newsgroups: sci.crypt
Lines: 450

Geoffrey Teabo wrote:
 
> That's right.  I really don't care about "attackers" at all.  I'm monitoring
> an independent system that doesn't even know it's being monitored.
> 
> Imagine if my program were collecting news articles over the wire services.
> I just want to be able to use a hash (or CRC) of each incoming plaintext to
> search a historical "list" of past hashes (or CRCs) to determine whether that
> particular IDENTICAL newsarticle was ever received previously.
> 
> As I pointed out in another post, news writers won't compose their news
> stories to try to hack my system!  They're just reporting the news.  They
> don't even know I have a system!!!
> 
> Can you point me toward source code for 128 bit CRC ???
> 
---------------------------
This should be adequate for your purposes, you can use it 
or throw it away, as you wish...
     I am not sure that you need the technical information,
but included it for completeness. For your application 
I would reduce the ROUNDS constant in the header to 2,
just to gain speed.
     BTW, your greatest concern will be in making the 
probability of "birthday paradox" as little as possible.
For rough estimate, the probability of 2 different messages 
hashing to the same value rises to 0.5 when the number 
of messages rises to 2^(n/2), where "n" is the number 
of bits in the hash. Thus, with a 128-bit hash you must
accumulate about 2^64 messages before you will start 
really worrying about hash collisions. With 10 million
messages a year it will take about 16 trillion years...
and you can always doublecheck the messages, should 
such an unlikely event happen once in a decade.
     The statistical tests indicate a uniform distribution 
of hex digits in the final hash with no noticeable biases
(6 runs by 1 million hashes each).

     This proposed function is based on multiplication 
(mod 2^32+1) with two "twists". First, however, some 
background information.


*************************
     The program takes the filename from the simple dialog 
and writes the hash into a new file with the name of the 
original file and extension "h32".

     Original message is divided into segments of the size
twice that of the final hash, e.g. if the final hash will 
be 160 bit, the segments will be 320 bit each. The size of 
the hash and segments is #defined by the HASHBLOCKSIZE 
constant and can be altered according to need.

    For each block, there are 3 rounds (for those 
paranoid among us, there is a #define, where one can 
change the ROUNDS constant to whatever desired).
*************************

     The function uses 256 different modular multipliers 
derived from one initial number "SEED". This number is 
0x6A09E667, which happens to be the first 8 hex digits of 
SQRT(2), less initial 1. The use of this number should 
dispel any suspicions about a "trap door" built into SEED.
The multipliers themselves are simply the powers of SEED, 
so that Mult[k] = SEED^(k+1), this way SEED^0 = 1 is 
avoided, we don't want any trivial multipliers. Also, this 
number produces a 33,502,080 long cycle of its powers 
until it comes back to 1, so there are no duplicate 
multipliers in the array.

     The multiplier array can be precomputed in advance or 
can be filled at the program start. The latter case saves 
some code (no need for a big data table), but implies a 
time penalty of 255 modular multiplications.

     Now about the "twists". The first twist boils down to 
the plaintext-dependent choice of the multiplier. If a 
certain 4-byte number is going to be hashed, the 
corresponding multiplier is chosen from the table with the 
index I = XOR(all 4 bytes of the number). 

     Second twist regards the progression along the block.
After the multiplication, when the bytes of the product 
are returned to their places and it is time to proceed 
with the subsequent multiplication, the two LS bytes of 
the previous product become the two MS bytes of the new 
number to be multiplied, and two adjacent bytes of 
the plaintext are appended to them in order to make this 
new 32-bit number. When the end of the block is reached,
the index wraps cyclically over the BlockLength.

       Graphically a round can be visualized in the 
following sketch:
          ( the index is circular mod "BlockLength" )
___________________________________________________________
                  1-st multiplication

        XXXXxxxxxxxxxxxxxxxxxxxxxxxxxxxx

        ( XXXX - 32-bit number to multiply, multiplier 
index is computed as X^X^X^X)
___________________________________________________________

                  2-nd multiplication

        ppPPXXxxxxxxxxxxxxxxxxxxxxxxxxxx

        (ppPP - product of the first multiplication,
     PPXX - 32-bit number to multiply, multiplier 
     index is computed as P^P^X^X)
___________________________________________________________

                  3-d multiplication

        ppppPPXXxxxxxxxxxxxxxxxxxxxxxxxx

        ( PPXX - 32-bit number to multiply, multiplier 
     index is computed as P^P^X^X)
...........................................................

                  Last multiplication

        XXppppppppppppppppppppppppppppPP

        ( The index is cyclically wrapped over -
     PPXX - 32-bit number to multiply, multiplier 
     index is computed as P^P^X^X)
___________________________________________________________

        This arrangement allows an ultra-rapid propagation 
of all changes across the entire block - there is complete 
avalanche after the 2-nd round and then after each 
subsequent round.

        It should be also noted that this arrangement 
effectively molds the entire hash into an inseparable 
entity, where the junctions between the adjacent bytes and 
words don't exist any more, they are all blurred by 
overlapping and by cyclical wrapover of the index. This 
strongly discourages any attacks based on different 
behavior of the distinct bytes and words during hashing. 
In this function the entire hash is just a circular 
sequence of bits without beginning or end, without the 
junctions between bytes or words, without any realistic 
possibility to trace the output changes back to the changes 
in any particular word or byte. The cryptanalyst would 
either be forced to attack the hash as a whole (128 or 
even 256 bits), or just give up, which I hope will 
precisely happen.

     After 3 rounds, the new subsequent block is read from 
the message and XOR-ed with the hash obtained in the previous 
cycles. Then the hashing procedure repeats again for 3 
rounds, and so on until the last message block is read. 
If the last message block is shorter than needed, it is 
padded with 0-s. Thereafter one more block is added to the 
hash, containing the full message length as a 64-bit binary 
number padded with 0-s. The final operation XOR-s the two 
halves of the hash buffer, producing the output hash value. 

     Since in each multiplication step the multiplicand is 
replaced with the product, there is no practical way of 
reconstructing the multiplier index (other than computing 
multiplicative inverses for all 256 multipliers and testing 
the product to find out the match for the index used, 
however, the final XOR-ing defeats this option).

     Initially the hash buffer contains either 0's or an 
appropriate IV. No hashing is done on this value, the first 
message block is just XOR-ed into the hash buffer. 

     If the secret key will be used as the IV, the function 
will produce a MAC.

     The source code was compiled under Borland C++ 4.52 
and should compile under any ANSI-compliant C compiler. 
The routines for breaking and making the 32-bit integers 
from 4 bytes should assure identical results on both big-
endian and little-endian platforms (not tested).

     No attempt has been made in order to optimize the 
program in any way. After all, it is supposed to be just 
a working model, not a racing car. One simple possible 
optimization might reverse the direction of scanning 
over the "text" block for little-endian Intel processors.

     Any comments would be appreciated.

/*********************   Start of code     ************/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define byte unsigned char
#define word16 unsigned int
#define word32 unsigned long int

#define LO 0x0000FFFFUL
#define HI 0xFFFF0000UL
#define HASHBLOCKSIZE 16
#define ROUNDS	3
#define SEED  0x6A09E667UL  /* Modular Multiplier seed */
		/* First 8 hex digits of SQRT(2), less initial 1 */

/* variables */
word32 mult[256];                     /* multiplier array */
word32 message_length;                /* size of input file */
int end_of_file;
byte inbuffer[2*HASHBLOCKSIZE], outbuffer[HASHBLOCKSIZE];
byte text[2*HASHBLOCKSIZE];      /* the working array */

/*file handling functions*/
char read_char_from_file(FILE *fp);
void read_block_from_file(FILE *fp);
void write_char_to_file(char data,FILE *fp);

/* Functions */
word32 Mul (word32 x, word32 y)            /* Returns (x * y) mod
(2^32+1) */
                            /*  For the purposes of this algorithm 0 =
2^32 */

	/* Modular Multiplication "CORRECT BUT SLOW" (mod 2^32+1)
		If your compiler does not handle the "double long"
		integers (64 bit), you have to split the 32-bit words
		into 16-bit halves, multiply them, then juggle the
		intermediate products until you get the correct result.
	*/
{
	word32 t[4]; word16 u[2], v[2];
		if (x == 0) return (1 - y);
		if (y == 0) return (1 - x);

	u[0] = (word16)(x & LO);
	v[0] = (word16)(y & LO);
	u[1] = (word16)((x >> 16) & LO);
	v[1] = (word16)((y >> 16) & LO);

	t[0] = (word32) u[0] *  (word32) v[0] ;
	t[1] = (word32) u[1] *  (word32) v[0] ;
	t[2] = (word32) u[0] *  (word32) v[1] ;
	t[3] = (word32) u[1] *  (word32) v[1] ;
			  /* intermediate 32-bit products */

	t[3] = t[3] + ((t[1] >> 16) & LO)
					+ ((t[2] >> 16) & LO);
					/* no overflow possible here */

	t[1] = (t[1] & LO) + (t[2] & LO)
					+ ((t[0] >> 16) & LO);
					/* collect them all in t[1]. Overflow into the
						17-th bit possible here */

	t[0] = (t[0] & LO) + ((t[1] << 16) & HI);
	t[3] = t[3] + ((t[1] >> 16) & LO); /* add eventual overflow */

	return (t[0] - t[3] + (t[3] > t[0]));

} /* Mul */
#define MUL(x,y) (x=Mul(x,y))

void MultSetup (word32 *mul)
{
  int k;
	*mul = SEED;   /* Initialize multiplier array */
	for (k = 1; k < 256; k++)	 
	  { 
	    *(mul+k) = *(mul+k-1); /* Copy next <- previous */
	    MUL (*(mul+k),SEED); /* Subsequent power */
	  }
} /* MultSetup */

void DissH1( word32 H, byte *D )
/*
	  Disassemble the given word32 into 4 bytes.
	  We want it to work on all kinds of machines,
	  Little-Endian or Big-Endian.
*/
{
  word32 T ;
	 T = H ;
	 *D++ = (byte)((T & 0xff000000UL) >> 24) ;
	 *D++ = (byte)((T & 0x00ff0000UL) >> 16) ;
	 *D++ = (byte)((T & 0x0000ff00UL) >>  8) ;
	 *D   = (byte) (T & 0x000000ffUL)        ;
} /* DissH1 */

word32 MakeH1( byte *B )
/*
	  Assemble a word32 from the four bytes provided.
	  We want it to work on all kinds of machines,
	  Little-Endian or Big-Endian.
*/
{
  word32 RetVal, temp ;
	 temp = *B++ ;
	 RetVal = (temp << 24) ;
	 temp = *B++ ;
	 RetVal += (temp << 16) ;
	 temp = *B++ ;
	 RetVal += (temp <<  8) ;
	 RetVal += *B ;
	 return RetVal ;
} /* MakeH1 */

void Scramble (void) /* This subroutine scrambles the "text" block */
				/*  It uses the global variables */
				/*  mult[], text[] and inbuffer[] */

{
int i,k,m;
word32 temp1, temp2;

		/* XOR-ing the input into the text array */
  for (m=0; m < 2*HASHBLOCKSIZE; m++) text[m] ^= inbuffer[m];

		/* Now we can start squashing the block */
  for (m=0; m<ROUNDS; m++)
  {
   for (k=0; k<2*HASHBLOCKSIZE; k+= 2)
    {
				/* Build a 32-bit multiplicand and multiplier index */
				/* We want this to produce same results on big-endian */
				/* and little-endian machines alike */
     temp2 = text[k % (2*HASHBLOCKSIZE)] ;     /* Essentially this
procedure */
     i = (int)temp2;                         /* is identical to MakeH2
function */
     temp1 = (temp2 << 24) ;                 /* However, it is
impossible to use */
     temp2 = text[(k+1)%(2*HASHBLOCKSIZE)] ; /* MakeH2 function
directly, because */
     i ^= (int)temp2;                        /* the loop index must be
wrapped */
     temp1 += (temp2 << 16) ;                /* mod 2*HASHBLOCKSIZE, and
also the */
     temp2 = text[(k+2) % (2*HASHBLOCKSIZE)] ; /* multiplier index is
computed, */
     i ^= (int)temp2;                        /* which is not provided in
MakeH2 */
     temp1 += (temp2 <<  8) ;
     temp1 += text[(k+3) % (2*HASHBLOCKSIZE)] ;
     i ^= (int)temp2;

		  /* Get the modular product into "temp1" */
			MUL(temp1, mult[i]);

		 /* Return the bytes where they belong */
	  text[k     % (2*HASHBLOCKSIZE)] = (byte)((temp1 & 0xff000000UL) >>
24) ;
	  text[(k+1) % (2*HASHBLOCKSIZE)] = (byte)((temp1 & 0x00ff0000UL) >>
16) ;
	  text[(k+2) % (2*HASHBLOCKSIZE)] = (byte)((temp1 & 0x0000ff00UL) >> 
8) ;
	  text[(k+3) % (2*HASHBLOCKSIZE)] = (byte) (temp1 & 0x000000ffUL) ;
    }
  }
}    /* Scramble */

char read_char_from_file(FILE *fp)
{
  char temp=0;
    message_length++;
	if ((fread(&temp,sizeof(char),1,fp))!=1)
	  {
		  end_of_file=1;
		  message_length--;
	  }
    return (temp);
} /* read_char_from_file */

void read_block_from_file(FILE *fp)
{
  int i;
	for (i=0; i<2*HASHBLOCKSIZE; i++)
	 {
		 inbuffer[i]=read_char_from_file(fp);
	 }
} /* read_block_from_file */


void write_char_to_file(char data,FILE *fp)
{
	if ((fwrite(&data,sizeof(char),1,fp))!=1)
	{
		printf("Fatal Error writing output file!!!\n");
		exit(-1);
	}
} /* write_char_to_file */


void main()
{
	FILE *in,*out;
	char infile[100], outfile[100], *dot;
	int k;

			/* Opening files */
	printf("\nEnter input filename:");
	gets(infile);

	if ((in=fopen(infile,"r+b"))==NULL)
	{
		printf("\nError opening input file: %s\n",infile);
		exit (-1);
	}

	strcpy(outfile,infile);
	dot=strrchr(outfile,'.'); dot++;
	*dot++='h'; *dot++='3';   /* Changing file extension */
	*dot++='2'; *dot='\0';    /* to .h32   */

	if ((out=fopen(outfile,"w+b"))==NULL)
	{
		printf("\nError opening output file: %s\n",outfile);
		exit(-1);
	}
	printf("\nOutput file is: %s\n",outfile);

					 /* Setting up the multipliers */
	MultSetup(mult);

					 /* Clearing the text buffer */
	for (k=0; k<2*HASHBLOCKSIZE; k++)  text[k]=0;

		/* Reading and squashing the input file */
	while(end_of_file != 1)
	 {
		read_block_from_file(in);
		Scramble();
	 }

		 /* Building up and squashing the final block */
	for (k=0; k<4; k++) inbuffer[k]=0;
	DissH1(message_length, (inbuffer+4));
	for (k=8; k<2*HASHBLOCKSIZE; k++) inbuffer[k]=0;
			Scramble();

					 /* Building and writing the final hash */
	for (k=0; k<HASHBLOCKSIZE; k++)
				outbuffer[k] = text[k]^text[k+HASHBLOCKSIZE];

	for (k=0; k<HASHBLOCKSIZE; k++)
	 {
		 write_char_to_file(outbuffer[k],out);
	 }

	fclose(in); fclose(out);
}


Subject: Re: Extreme lossy text compression
Date: 17 Apr 1999 22:20:41 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr1722.20.41.2875@koobera.math.uic.edu>
References: <7f9uhk$91k@sjx-ixn9.ix.netcom.com>
Newsgroups: sci.crypt
Lines: 27

> you might as well use a 128 bit CRC.

hash127 is much faster than a 128-bit CRC. Try it for yourself:

   http://pobox.com/~djb/hash127.html

In spirit these functions are really all the same:

   * reduce (Z/2)[x] modulo some prime x^128-... (CRC/``division'');
   * reduce Z modulo some prime 2^128-... (basically Karp-Rabin);
   * reduce F_(2^32)[x] modulo some prime x^4-... (Shoup);
   * reduce (Z/(2^31-1))[x] modulo some prime x^4-...;
   * reduce F_(2^128)[x] modulo some prime x-... (``evaluation'');
   * reduce (Z/(2^127-1))[x] modulo some prime x-... (hash127);
   * etc.

All of them have about the same natural hardware speed. However, in
software, arithmetic in large characteristic benefits from CPU support
for fast multiplication---often buried inside the floating-point unit,
but still accessible---whereas arithmetic in characteristic 2 doesn't.

I hope that CPUs will someday provide fast polynomial arithmetic mod 2;
but there's no reason that it'll ever be faster than integer arithmetic.
I haven't found any cryptographic applications where finite fields of
small characteristic turned out to be a good idea.

---Dan


Subject: Re: Extreme lossy text compression
Date: Sat, 17 Apr 1999 23:39:51 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <37191bb5.2504101@news.io.com>
References: <1999Apr1722.20.41.2875@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 53


On 17 Apr 1999 22:20:41 GMT, in
<1999Apr1722.20.41.2875@koobera.math.uic.edu>, in sci.crypt
djb@koobera.math.uic.edu (D. J. Bernstein) wrote:

>> you might as well use a 128 bit CRC.
>
>hash127 is much faster than a 128-bit CRC. Try it for yourself:
>
>   http://pobox.com/~djb/hash127.html
>
>In spirit these functions are really all the same:
>
>   * reduce (Z/2)[x] modulo some prime x^128-... (CRC/``division'');
>   * reduce Z modulo some prime 2^128-... (basically Karp-Rabin);
>   * reduce F_(2^32)[x] modulo some prime x^4-... (Shoup);
>   * reduce (Z/(2^31-1))[x] modulo some prime x^4-...;
>   * reduce F_(2^128)[x] modulo some prime x-... (``evaluation'');
>   * reduce (Z/(2^127-1))[x] modulo some prime x-... (hash127);
>   * etc.
>
>All of them have about the same natural hardware speed. However, in
>software, arithmetic in large characteristic benefits from CPU support
>for fast multiplication---often buried inside the floating-point unit,
>but still accessible---whereas arithmetic in characteristic 2 doesn't.
>
>I hope that CPUs will someday provide fast polynomial arithmetic mod 2;
>but there's no reason that it'll ever be faster than integer arithmetic.
>I haven't found any cryptographic applications where finite fields of
>small characteristic turned out to be a good idea.

Surely we don't implement CRC's like this.  Addition mod 2 is
exclusive-OR which is well supported in CPU's.  "Division" mod an
irreducible mod 2 poly is just a test and conditional exclusive-OR,
which is, again, well supported.

For production use, CRC's are table-driven.  Typically, we generate a
256-entry table, then process data byte-by-byte.  Each step typically
requires a byte exclusive-OR, an 8-bit "shift" of the CRC (which might
involve simply changing a pointer to the head), and an exclusive-OR
from table into the CRC register.  

That said, 128 bit (16-byte) values are not usually supported in CPU
registers or instruction sets.  So I would instead probably use 4
CRC's using different deg-32 primitives (or perhaps 5 deg-31's) to
produce a similar effect.  See, for example:

   http://www.io.com/~ritter/KEYSHUF.HTM

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Extreme lossy text compression
Date: 18 Apr 1999 08:27:02 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr1808.27.02.4647@koobera.math.uic.edu>
References: <37191bb5.2504101@news.io.com>
Newsgroups: sci.crypt
Lines: 43

Terry Ritter <ritter@io.com> wrote:
> For production use, CRC's are table-driven.

Indeed. The standard CRC-32 implementation reduces an n-bit polynomial
modulo a degree-32 polynomial p using

   * n/8 additions of 32-bit polynomials,
   * n/8 multiplications of 24-bit polynomials by x^8, and
   * n/8 table lookups of f |-> (x^32 f) mod p, where f has 8 bits.

The analogous CRC-128 implementation would use

   * n/8 additions of 128-bit polynomials,
   * n/8 multiplications of 120-bit polynomials by x^8, and
   * n/8 table lookups of f -> (x^128 f) mod p, where f has 8 bits.

Now, if you could magically set up a size-2^32 table instead of a
size-2^8 table, you could get away with far fewer operations:

   * n/32 additions of 128-bit polynomials,
   * n/32 multiplications of 96-bit polynomials by x^32, and
   * n/32 table lookups of f -> (x^128 f) mod p, where f has 32 bits.

That's basically the speed achieved by hash127, with integers replacing
polynomials mod 2, and the CPU's fast multiplication (inside the FPU)
replacing table lookups.

> "Division" mod an
> irreducible mod 2 poly is just a test and conditional exclusive-OR,
> which is, again, well supported.

I said ``support for fast multiplication.'' Note the word ``fast.''
Bit-by-bit multiplication, doing a conditional 128-bit addition for each
bit of input, is not fast.

> So I would instead probably use 4
> CRC's using different deg-32 primitives

That's a very bad idea for authentication of long messages. The maximum
collision probability is around b^4/2^128 where b is the message length.
Anyway, hash127 is much faster.

---Dan


Subject: Re: Extreme lossy text compression
Date: Mon, 19 Apr 1999 07:00:09 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <371ad344.43382112@news.io.com>
References: <1999Apr1808.27.02.4647@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 69


On 18 Apr 1999 08:27:02 GMT, in
<1999Apr1808.27.02.4647@koobera.math.uic.edu>, in sci.crypt
djb@koobera.math.uic.edu (D. J. Bernstein) wrote:

>Terry Ritter <ritter@io.com> wrote:
>> For production use, CRC's are table-driven.
>
>Indeed. The standard CRC-32 implementation reduces an n-bit polynomial
>modulo a degree-32 polynomial p using
>
>   * n/8 additions of 32-bit polynomials,

Yes, this is the add of the table value.

>   * n/8 multiplications of 24-bit polynomials by x^8, and

Note that this is just a single shift, and not a bit-by-bit
multiplication, as one might think from the comment below.  

>   * n/8 table lookups of f |-> (x^32 f) mod p, where f has 8 bits.


>[...]
>> "Division" mod an
>> irreducible mod 2 poly is just a test and conditional exclusive-OR,
>> which is, again, well supported.
>
>I said ``support for fast multiplication.'' Note the word ``fast.''
>Bit-by-bit multiplication, doing a conditional 128-bit addition for each
>bit of input, is not fast.

The only multiplication we have in CRC is a shift, but I expect an
integer multiply may be just as fast, in the registers.  Outside the
registers, it looks like a multiple-precision operation, and 128 bits
sounds outside the registers.  But my comment was made in the context
of the equations you showed:

>>>   * reduce (Z/2)[x] modulo some prime x^128-... (CRC/``division'');
>>>   * reduce Z modulo some prime 2^128-... (basically Karp-Rabin);
>>>   * reduce F_(2^32)[x] modulo some prime x^4-... (Shoup);
>>>   * reduce (Z/(2^31-1))[x] modulo some prime x^4-...;
>>>   * reduce F_(2^128)[x] modulo some prime x-... (``evaluation'');
>>>   * reduce (Z/(2^127-1))[x] modulo some prime x-... (hash127);
>>>   * etc.

Everywhere here I see "modulo some prime."  Now, that is hardly an
issue with CRC, but it sure can be an issue when dividing big integers
which here seem to be 128 bits.  How do you deal with this?  

>> So I would instead probably use 4
>> CRC's using different deg-32 primitives
>
>That's a very bad idea for authentication of long messages. 

You're right.


>The maximum
>collision probability is around b^4/2^128 where b is the message length.
>Anyway, hash127 is much faster.

I still find this hard to believe.  What happened to the
multi-precision divide?

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Extreme lossy text compression
Date: 19 Apr 1999 08:38:36 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr1908.38.36.9722@koobera.math.uic.edu>
References: <371ad344.43382112@news.io.com>
Newsgroups: sci.crypt
Lines: 9

Terry Ritter <ritter@io.com> wrote:
> I still find this hard to believe.

That's why they pay me the big bucks. :-)

Get the code from http://pobox.com/~djb/hash127.html and try it for
yourself. On a Pentium it takes about 4.5 clock cycles per input byte.

---Dan


Subject: Re: Extreme lossy text compression
Date: Mon, 19 Apr 1999 17:31:51 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <371b6854.1313556@news.io.com>
References: <1999Apr1908.38.36.9722@koobera.math.uic.edu>
Newsgroups: sci.crypt
Lines: 42


On 19 Apr 1999 08:38:36 GMT, in
<1999Apr1908.38.36.9722@koobera.math.uic.edu>, in sci.crypt
djb@koobera.math.uic.edu (D. J. Bernstein) wrote:

>Terry Ritter <ritter@io.com> wrote:
>> I still find this hard to believe.
>
>That's why they pay me the big bucks. :-)
>
>Get the code from http://pobox.com/~djb/hash127.html and try it for
>yourself. On a Pentium it takes about 4.5 clock cycles per input byte.

That's fine, but I'd rather have an explanation in plain English:  How
do you avoid multi-precision operations, multiply and divide, when
dealing with a 128-bit integer hash?


And, after thinking about it for a while, I would like to add to my
response in the previous message:

>>> So I would instead probably use 4
>>> CRC's using different deg-32 primitives
>>
>>That's a very bad idea for authentication of long messages. 
>
>You're right.

Indeed, *any* linear system is normally a bad idea for authentication
(unless, perhaps, it is inside the cryptographic envelope).  But that
would seem to rule out the whole list of hash methods which you
posted.  Does your hash fit in there somewhere?  Do you thus recommend
a linear hash for authentication?

When a linear system is acceptable, I see no serious theoretical
consequences from splitting it into smaller subsystems with about the
same total state.  

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Extreme lossy text compression
Date: 19 Apr 1999 20:48:18 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Apr1920.48.18.12961@koobera.math.uic.edu>
References: <371b6854.1313556@news.io.com>
Newsgroups: sci.crypt
Lines: 46

Here's an analogy in base 10. Suppose you want to compute

   2718281828 r^4 + 4590452353 r^3 + 6028747135 r^2 + 2662497757 r

modulo p, where p = 10^30-1 and r = 314159265358979323846264338327. You
do some precomputations with r:

   r^1 mod p = 3141592653 10^20 + 5897932384 10^10 + 6264338327
   r^2 mod p = 2631235735 10^20 + 1304915222 10^10 + 3466068927
   r^3 mod p = 7883992070 10^20 + 5909899612 10^10 + 2554294263
   r^4 mod p = 9595247344 10^20 + 1716075106 10^10 + 5252936926

Now you can compute the (approximately 40-digit) number

   2718281828 (r^4 mod p) + 4590452353 (r^3 mod p)
      + 6028747135 (r^2 mod p) + 2662497757 (r mod p)

with 3 dot products of ten-digit numbers, totalling 12 multiplications;
and then you can easily reduce the result modulo p.

Terry Ritter <ritter@io.com> wrote:
> How do you avoid multi-precision operations, multiply and divide, when
> dealing with a 128-bit integer hash?

I don't know what distinction you're trying to draw. These _are_
multiprecision operations. The same techniques can be used in many other
multiprecision arithmetic problems.

> Indeed, *any* linear system is normally a bad idea for authentication

On the contrary. See the hash127 man pages, or my previous postings, for
several safe authentication methods.

> When a linear system is acceptable, I see no serious theoretical
> consequences from splitting it into smaller subsystems with about the
> same total state.  

Again: The maximum collision probability for four independent CRC-32s on
a b-bit message is roughly b^4/2^128. That grows disastrously with b.
With today's networking technology an attacker can break that system in
a matter of minutes.

In contrast, CRC-128 has maximum collision probability around b/2^128,
and hash127 has maximum collision probability around 3b/2^133.

---Dan


Subject: Re: Extreme lossy text compression
Date: Wed, 28 Apr 1999 14:25:42 +0200
From: Mok-Kong Shen <mok-kong.shen@stud.uni-muenchen.de>
Message-ID: <3726FE46.7BD6044A@stud.uni-muenchen.de>
References: <37191bb5.2504101@news.io.com>
Newsgroups: sci.crypt
Lines: 14

Terry Ritter wrote:

> 
> For production use, CRC's are table-driven.  Typically, we generate a
> 256-entry table, then process data byte-by-byte.  Each step typically
> requires a byte exclusive-OR, an 8-bit "shift" of the CRC (which might
> involve simply changing a pointer to the head), and an exclusive-OR
> from table into the CRC register.

Could you say what speed you can achieve with that on a typical
processor (in assembler and in C or other languages)? Thanks 
in advance.

M. K. Shen


Subject: Re: Extreme lossy text compression
Date: Wed, 28 Apr 1999 22:48:08 -0600
From: jcoffin@taeus.com (Jerry Coffin)
Message-ID: <MPG.1191686f9d841f78989a3b@news.rmi.net>
References: <3726FE46.7BD6044A@stud.uni-muenchen.de>
Newsgroups: sci.crypt
Lines: 13

In article <3726FE46.7BD6044A@stud.uni-muenchen.de>, mok-
kong.shen@stud.uni-muenchen.de says...

[ a table-driven CRC ] 

> Could you say what speed you can achieve with that on a typical
> processor (in assembler and in C or other languages)? Thanks 
> in advance.

I just did a quick check with a version I have lying around and it did 
around 5 megabytes a second a Pentium II/400.  In all honesty, I 
suspect that a faster disk would improve that more than a faster CPU 
though.


Subject: Re: Extreme lossy text compression
Date: Sat, 1 May 1999 04:24:15 -0400
From: "John A. Limpert" <johnl@radix.net>
Message-ID: <7gedl0$sj2$1@news1.Radix.Net>
References: <MPG.1191686f9d841f78989a3b@news.rmi.net>
Newsgroups: sci.crypt
Lines: 25


Jerry Coffin <jcoffin@taeus.com> wrote in message
news:MPG.1191686f9d841f78989a3b@news.rmi.net...
> In article <3726FE46.7BD6044A@stud.uni-muenchen.de>, mok-
> kong.shen@stud.uni-muenchen.de says...
>
> [ a table-driven CRC ]
>
> > Could you say what speed you can achieve with that on a typical
> > processor (in assembler and in C or other languages)? Thanks
> > in advance.
>
> I just did a quick check with a version I have lying around and it did
> around 5 megabytes a second a Pentium II/400.  In all honesty, I
> suspect that a faster disk would improve that more than a faster CPU
> though.

I ran benchmarks of various CRC schemes on a Pentium a few years ago. Most
ran in the range of 5-10 megabytes/second. The limiting factor appeared to
be main memory bandwidth, not CPU speed. They all seemed to blow out the L2
cache.






Subject: Re: Extreme lossy text compression
Date: 1 May 1999 21:22:44 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999May121.22.44.24765@koobera.math.uic.edu>
References: <7gedl0$sj2$1@news1.Radix.Net>
Newsgroups: sci.crypt
Lines: 8

John A. Limpert <johnl@radix.net> wrote:
> The limiting factor appeared to be main memory bandwidth, not CPU speed.

hash127 handles input straight from DRAM at over 200 million bits/second
on my old Pentium-133. Anything slower than that is clearly not limited
by memory bandwidth.

---Dan


Subject: Re: CRC32
Date: Thu, 3 Jun 1999 14:41:48 +0200
From: "Amit Ghosh" <ghosh@gmx.de>
Message-ID: <7j5t5i$l60$1@news.online.de>
References: <19990603075422.4548.qmail@nym.alias.net>
Newsgroups: sci.crypt
Lines: 35

Hi,

I'm not a crypto/compression guy, so there may be better
answers to your questions, but I do my very best ;-)

1. If I remember it right, crc32 is described in ISO 3309.

2. If you are looking for a faster yet powerfull algorithm,
   you may want to try adler32 instead of crc32. You will find
   a small description of adler32 in RFC 1950. You may find ZLib
   interesting if you're looking for an implementation.

3. Since I'm not an expert on this field of science (see above), I
   may be wrong but as I understand, checksum algorithms are
   optimized to detect errors in data blocks and not to generate
   optimized hash values. You may want to look for hash value
   generation algorithms to solve your problem (if I get your
   question right). crc32 will most likely not solve your problem!


Hope this helps!


P.S.: Please update your newsreader configuration. Your reply address
      is malformed.


--
Amit Ghosh
maito:ghosh@gmx.de







Subject: Re: CRC32
Date: Thu, 03 Jun 1999 14:46:54 GMT
From: "Rosco Schock" <snarf@ptdprolog.net>
Message-ID: <yzw53.3088$ea.336874@nnrp1.ptd.net>
References: <19990603075422.4548.qmail@nym.alias.net>
Newsgroups: sci.crypt
Lines: 53

Hello,
Try this url. The paper it links to is very readable.
It gives a good basic understanding of CRCs.

http://www.ross.net/crc/crcpaper.html

As far as odds go
For example, CRC-32 gives a 1 / 2^32 chance of failure.
That equates to odds of 1 / 4,294,967,295 of getting the same checksum.
However, if you change to a filesize / CRC pair of numbers then
the only way you will get the same checksum is if the files are the same.

HTH,
Rosco

iLLusIOn wrote in message <19990603075422.4548.qmail@nym.alias.net>...
>-----BEGIN PGP SIGNED MESSAGE-----
>
>Hello,
>
>  I'm looking for any texts/urls with algorithms and descriptions
> of CRC-32 (like used in zip files for example), I have found a few
> sites providing source code but none which provide really useful
> descriptions of how/why it works. anything appreciated :)
>
>  are there any ways to speed crc32 up a bit? or are there any
> faster (yet as "secure") algorithms as crc32? how high is the
> possibility that i'd get the same checksum twice? my
> implementation would be used to generate checksums of many
> files, getting the same checksum on two different files would be
> bad.
>
>  tia
>
>~~~
>This PGP signature only certifies the sender and date of the message.
>It implies no approval from the administrators of nym.alias.net.
>Date: Thu Jun  3 07:54:19 1999 GMT
>From: illusion@nym.alias.net
>
>-----BEGIN PGP SIGNATURE-----
>Version: 2.6.2
>
>iQEVAwUBN1Y0rU5NDhYLYPHNAQEEAQf+IWtNZS0IjjVX9bfWh7jYp4cBJrWyqw96
>0x8cbIYRMcWmOsr3Xsf0RQ6HkrRPGLm+Kkqjca11E0xwqNmUfWZorvprqGwvDATA
>JU5Gbv0s5Hw8TXKSGShnqnnfk6tZnMyYw4IjWCbcyPETYSdInOruqbl2XzqI9OQX
>Zj8K/CK0wN+O6mVAV9O/SmWSufwklMTv5lPTkUYMHqIEJjHmuMs7OgWTS0R4cn3t
>oJ+tBda/DFwAhhPQNaVHOnlIDdfoj/F9AVoNKQq0q3NmeGbk4W71LUP9TAHGdCcl
>jaHQ55hyvt4SYR67w40OgBhoEo0N+0JqUtQ1QswP24PRJcr+2Tgdfw==
>=T0LD
>-----END PGP SIGNATURE-----




Subject: Re: CRC32
Date: Thu, 3 Jun 1999 10:47:44 -0500
From: "Anwer Bashi" <anwerb@computrols.com>
Message-ID: <7j6866$jil$1@nntp.gulfsouth.verio.net>
References: <yzw53.3088$ea.336874@nnrp1.ptd.net>
Newsgroups: sci.crypt
Lines: 28

Rosco Schock <snarf@ptdprolog.net> wrote in message
news:yzw53.3088$ea.336874@nnrp1.ptd.net...
> However, if you change to a filesize / CRC pair of numbers then
> the only way you will get the same checksum is if the files are the same.

Not true.  You can still get the same CRC with two different files of the
same size.  Any change in the file will (very very likely) cause a
corresponding change in the CRC.  All you have to do is copy the original
file and keep changing the copy by flipping bits here and there at random
until the CRC happens to come out to be the same again.

With only 2^32 possible CRC values, you would likely hit on a match quite
soon given today's computing power.  As a speedup, you probably would only
need to take the last few bytes of the file.  Compute the CRC of the rest of
the file up to that point and then use that CRC value as the starting point
for the last few bytes.  Since you won't be changing the first part of the
file, you don't need to recompute the CRC for it.  Also, remember that there
are 2^80 modifications you can make to 10 bytes (80 bits), not just 10
modifications or 2^10.

There are obviously better ways of doing this by working backwards from the
current CRC and figuring out the appropriate bits to flip to get the desired
result, however I am not sure how to do this.  I would be interested in
hearing about this (anyone?)

Anwer




Subject: Re: CRC32
Date: Thu, 03 Jun 1999 10:34:51 -0700
From: Bryan Olson <bryan.olson@uptronics.com>
Message-ID: <3756BCBB.85D6B97@uptronics.com>
References: <7j6866$jil$1@nntp.gulfsouth.verio.net>
Newsgroups: sci.crypt
Lines: 18



Anwer Bashi wrote:
> You can still get the same CRC with two different files of the
> same size.  Any change in the file will (very very likely) cause a
> corresponding change in the CRC.  All you have to do is copy the original
> file and keep changing the copy by flipping bits here and there at random
> until the CRC happens to come out to be the same again.
[...]
> There are obviously better ways of doing this by working backwards from the
> current CRC and figuring out the appropriate bits to flip to get the desired
> result, however I am not sure how to do this.  I would be interested in
> hearing about this (anyone?)

Just XOR in the CRC polynomial.  You can generate a million 
collisions a second on a computer from a toy store.

--Bryan


Subject: Re: CRC32
Date: Thu, 03 Jun 1999 12:32:20 -0400
From: Paul Koning <pkoning@xedia.com>
Message-ID: <3756AE14.29EEB7EB@xedia.com>
References: <19990603075422.4548.qmail@nym.alias.net>
Newsgroups: sci.crypt
Lines: 51

iLLusIOn wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> 
> Hello,
> 
>   I'm looking for any texts/urls with algorithms and descriptions
>  of CRC-32 (like used in zip files for example), I have found a few
>  sites providing source code but none which provide really useful
>  descriptions of how/why it works. anything appreciated :)

You could look up Peterson's 1962 paper...
 
>   are there any ways to speed crc32 up a bit? 

Compared to what?  The easiest description of CRC is one bit at
a time.  With a lookup table, you can do multiple bits at a time.
Byte at a time is trivial (256 entry table).  More than that is
doable if you have the memory, and your CPU cache is big enough 
that you don't get cache trashing.

> or are there any
>  faster (yet as "secure") algorithms as crc32? 

No CRC is secure (in the sense of a "secure hash").  All they
are is error detecting codes, designed to detect most errors
generated by random noise and other non-malicious processes.

None work well for long errors (much longer than the CRC size).
Note that a bit deletion or addition is a long error (the length
is the rest of the message).  For long errors, n-bit CRC will
fail to detect one in 2^n errors.  For short errors, it does
better (in particular, a good CRC will detect *all* errors of
length k or less where k is n plus/minus a few).

Some people use CRC as an error detecting code.  This is a
Bad Idea, because the probability of miscorrection is rather high.
If you need ECC, use a code designed for ECC.

> how high is the
>  possibility that i'd get the same checksum twice? my
>  implementation would be used to generate checksums of many
>  files, getting the same checksum on two different files would be
>  bad.

About one in 2^(n/2) for n bit CRC by the birthday rule.

You may want to use SHA-1 or MD5, it's slower than CRC but the
greater length will help, plus it's a secure hash.

	paul


Subject: Re: CRC32
Date: Mon, 07 Jun 1999 22:39:00 GMT
From: tomstdenis@my-deja.com
Message-ID: <7jhhm4$3tf$1@nnrp1.deja.com>
References: <3756AE14.29EEB7EB@xedia.com>
Newsgroups: sci.crypt
Lines: 35


> Some people use CRC as an error detecting code.  This is a
> Bad Idea, because the probability of miscorrection is rather high.
> If you need ECC, use a code designed for ECC.

The thing is if their are more 'random' bits in the message then the
correction code you are *always* going to miss an error.  It's not
possible to detect all errors.  Some codes however have nice properties
to detect smaller errors (single bit, byte or multiple bytes).  Reed
solomon (?) can detect I think in (255,233) about 16 singlebit errors
in any message.

> About one in 2^(n/2) for n bit CRC by the birthday rule.

This applies to any hash or checksum though.  And is not applicable to
error dectection (there is no reason).

> You may want to use SHA-1 or MD5, it's slower than CRC but the
> greater length will help, plus it's a secure hash.

MD5 is probably a better choice as it has 16 fewer rounds, uses less
registers and is quite simple.  I heard Tiger is great for 64-bit
platforms (it has 3 64 bit registers...) so you may want to look up Eli
Biham for his algorithm.

Tom
--
PGP public keys.  SPARE key is for daily work, WORK key is for
published work.  The spare is at
'http://members.tripod.com/~tomstdenis/key_s.pgp'.  Work key is at
'http://members.tripod.com/~tomstdenis/key.pgp'.  Try SPARE first!


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.


Subject: Re: CRC32
Date: Wed, 09 Jun 1999 13:01:02 -0400
From: Jerry Leichter <leichter@smarts.com>
Message-ID: <375E9DCE.29AD@smarts.com>
References: <3756AE14.29EEB7EB@xedia.com>
Newsgroups: sci.crypt
Lines: 27

| ...No CRC is secure (in the sense of a "secure hash")...

This is true, but it's worth pointing out a subtlety:  A *secret* CRC
polynomial provides a primitive related to (but by no means the same as)
a secure hash.  Rabin has an old paper on this called something like
"Fingerprinting with Random Polynomials".  Basically, if you choose a
CRC polynomial suitably (from among primitive polynomials? - the exact
condition escapes me, but there are simple, efficient ways to do it) and
compute an n-bit "fingerprint" of a piece of data using that polynomial,
then an attacker has essentially a 1 in 2^n chance of being able to
construct a different piece of data with the same fingerprint.

If the attacker knows the polynomial constructing the different piece of
data is trivial.  Also, if you reveal more than a few data/fingerprint
pairs based on the same polynomial, an attacker can learn what the
polynomial is.  (Even revealing one data/fingerprint pair gives away
some information, perhaps too much.)

Rabin's proposed use is that you select your polynomial at random, keep
the fingerprints secret, and use them later to detect alteration.

Of course, these days you would probably use a secure hash function,
which doesn't require keeping the fingerprints secret.  (On the other
hand, the security in Rabin's scheme is easily provable; the security of
secure hash functions is still based on faith.)

							-- Jerry


Subject: Re: CRC32
Date: Thu, 03 Jun 1999 10:58:57 -0700
From: Bryan Olson <bryan.olson@uptronics.com>
Message-ID: <3756C261.A99FFCF2@uptronics.com>
References: <19990603075422.4548.qmail@nym.alias.net>
Newsgroups: sci.crypt
Lines: 33



iLLusIOn wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> 
> Hello,
> 
>   I'm looking for any texts/urls with algorithms and descriptions
>  of CRC-32 [...]

You've already gotten some answers on that, so I'll just address
one point.

>   are there any ways to speed crc32 up a bit? or are there any
>  faster (yet as "secure") algorithms as crc32? how high is the
>  possibility that i'd get the same checksum twice? my
>  implementation would be used to generate checksums of many
>  files, getting the same checksum on two different files would be
>  bad.

You've left out one important point: might an adversary try
to generate files with the same checksum?  Using any of the 
popular checksum algorithms, such as CRCs or Fletcher/Adler 
sum of sums, we can easily construct files with the same 
checksum, or with a given checksum.

If you're only worried about accidental collisions, a large 
CRC is fine.  If you have to resist intelligent attackers, 
then you need a cryptographic hash such as SHA-1.


--Bryan

Author-Address: illusion <AT> nym <DOT> alias <DOT> net

Subject: Re: CRC32
Date: 4 Jun 1999 07:26:00 -0000
From: iLLusIOn <Use-Author-Address-Header@[127.1]>
Message-ID: <19990604072600.9756.qmail@nym.alias.net>
References: <3756C261.A99FFCF2@uptronics.com>
Newsgroups: sci.crypt
Lines: 45

-----BEGIN PGP SIGNED MESSAGE-----

Hi,

  first i'd like to thank anyone who answered and provided
 urls, source code and other suggestions.

>You've left out one important point: might an adversary try
>to generate files with the same checksum?  

no, it might only occure accidental

>If you're only worried about accidental collisions, a large 
>CRC is fine.  If you have to resist intelligent attackers, 
>then you need a cryptographic hash such as SHA-1.

i have tried SHA, but it is a bit slow i think... i'd just need
this CRC algorithm for a program which should, lets say,
detect duplicate data blocks of a variable size from a few
bytes upto a few megabytes within a file of about 1GB.
i guess the best (fastest/most secure) method would be to
generate a checksum of all data blocks within this file, and
then to compare each data blocks which have the same
checksum byte-for-byte.
i was asking for a more secure algorithm (with the same or
better speed) then CRC32 as less duplicate checksums
would save me time doing byte for byte comparisons.

~~~
This PGP signature only certifies the sender and date of the message.
It implies no approval from the administrators of nym.alias.net.

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQEVAwUBN1d/h05NDhYLYPHNAQE+cAf9Ey4372K93/+9QCNNj0DKGavYnYsK8fi1
ddjiovfKWx0CorZgDHwJWVKa2TEXRxnOnSuEVJxFh9R7EAX73BpcHkVJol4Sjq1W
tgLrly5DjEFHUxZg3RVB1FrxBghIMMVP52bpWD4DSiZK1+LBnlHo+Lq6e1kHGVCm
wmLBc36tcvQRaXao3dHXFL9x5zg2bEIUiLOnjg7jzzhzMiRtQBD5Ki7F76GXCC3b
TaMOf5u3sbje3U6d8V8ovLzbWEbgf1XZtOXmK/CNJfbBoho1pBU+QjSvHQA22Hih
6ipwytbW6pvpOSbseYe7Ca/SRcRH0Myx8phJhxJcsMtBfxSCWvlrfw==
=ODau
-----END PGP SIGNATURE-----


Subject: Re: CRC32
Date: Fri, 04 Jun 1999 16:25:11 -0400
From: Paul Koning <pkoning@xedia.com>
Message-ID: <37583627.A491457B@xedia.com>
References: <19990604072600.9756.qmail@nym.alias.net>
Newsgroups: sci.crypt
Lines: 64

iLLusIOn wrote:

> >You've left out one important point: might an adversary try
> >to generate files with the same checksum?
> 
> no, it might only occure accidental
> 
> >If you're only worried about accidental collisions, a large
> >CRC is fine.  If you have to resist intelligent attackers,
> >then you need a cryptographic hash such as SHA-1.
> 
> i have tried SHA, but it is a bit slow i think... 

If you don't have intelligent attackers, any cryptographic
hash is serious overkill and bound to take much more CPU
time than error checking codes.

> i'd just need
> this CRC algorithm for a program which should, lets say,
> detect duplicate data blocks of a variable size from a few
> bytes upto a few megabytes within a file of about 1GB.

Something to keep in mind: each CRC has properties like this:

a. detects all single error bursts of burst length less than k
b. detects all errors that change fewer than p bits in messages
shorter than m bytes.

In the above, I believe k is independent of message length but
p is not.  

For example, CRC-32 will detect all 3-bit errors in packets of
less than about 11k bytes.  Beyond that it will detect all 2 bit
errors but there are 3 bit patterns it misses.  (Possibly "3"
should be "4" and "2" should be "3" above, but the point remains
the same.

This is why Ethernet uses CRC-32 rather than CRC-16 (the lengths
at which the CRC gets weak are smaller for smaller CRCs).
This is why you shouldn't use 64k byte packets in ATM or in 
token ring even though the standard allows it.

If you want to protect megabytes with a single checksum, you
might want to use CRC-64.  I don't know any polynomials;
one was proposed for HiPPI and could perhaps be found in
those documents.  Or you could do a separate CRC-32 for
each 8k bytes or so.  Or you could ignore the problem, if
your error checking requirements aren't that tough.

I think there are a few more questions:
1. What sort of errors are you worried about?
  a. bits changed,
  b. bits deleted or added
  c. bits or groups of bits interchanged
2. Single error burst or multiple bursts?  How long?

If your requirements are weak enough, the IP checksum will 
do; that one is about as easy as you can get.  But it is also
quite feeble.  Fletcher is slightly better, also somewhat
harder (in particular, definitely harder if you want to do
more than one byte at a time).  CRC is better than either one
and also slower.

	paul


Subject: Re: CRC32
Date: 5 Jun 1999 06:19:05 GMT
From: djb@koobera.math.uic.edu (D. J. Bernstein)
Message-ID: <1999Jun506.19.05.17619@koobera.math.uic.edu>
References: <19990603075422.4548.qmail@nym.alias.net>
Newsgroups: sci.crypt
Lines: 11

> are there any ways to speed crc32 up a bit? or are there any
> faster (yet as "secure") algorithms as crc32?

http://pobox.com/~djb/hash127.html is almost as fast as an optimized
CRC-32, but produces a 127-bit hash, using a 128-bit key.

As explained in http://pobox.com/~djb/papers/hash127.dvi, if you feed
any set of N inputs to hash127, each at most 4L bytes long, then there
are at most N(N-1)(L+2) possible keys that will produce a collision.

---Dan


Subject: CRC-16 Reverse Algorithm ?
Date: Fri, 25 Feb 2000 14:18:56 -0500
From: "Liu, Qingzhu [CAR:CF36:EXCH]" <qliu@americasm01.nt.com>
Message-ID: <38B6D5A0.BCC88956@americasm01.nt.com>
References: <19990603075422.4548.qmail@nym.alias.net>
Newsgroups: sci.crypt
Lines: 19

Hello, 

I try to implement an algorithm for crc-16 reverse 
(Poly: 1 + x + x^14 + x^16) in PIC17c756 assembly code.

My questions are:
1. What does the "Reverse' mean ? 
2. How is the dividing circuit looking like?
3. Is there a Byte-wise CRC calculation algorithm for it?  
4. What is good text book about this topic?

Thansk for helps in advance.

-- 
Qinghu Liu
BWA, Nortel Networks
Phone: (613)763-3705
Fax:   (613)763-6681
email: qliu@nortelnetworks.com


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Fri, 25 Feb 2000 15:56:10 GMT
From: jsavard@domain.ctry (John Savard)
Message-ID: <38b6a5e2.25477024@news.prosurfr.com>
References: <38B6D5A0.BCC88956@americasm01.nt.com>
Newsgroups: sci.crypt
Lines: 14

"Liu, Qingzhu [CAR:CF36:EXCH]" <qliu@americasm01.nt.com> wrote, in
part:

>I try to implement an algorithm for crc-16 reverse 
>(Poly: 1 + x + x^14 + x^16) in PIC17c756 assembly code.

>My questions are:
>1. What does the "Reverse' mean ?

It isn't 1 + x^2 + x^15 + x^16, the polynomial turned around
'backwards'. 

John Savard (jsavard<at>ecn<dot>ab<dot>ca)
http://www.ecn.ab.ca/~jsavard/crypto.htm


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Fri, 25 Feb 2000 21:06:28 GMT
From: p27604@email.mot.com (Doug Stell)
Message-ID: <38b6ebcb.587885785@schbbs.mot.com>
References: <38B6D5A0.BCC88956@americasm01.nt.com>
Newsgroups: sci.crypt
Lines: 137

On Fri, 25 Feb 2000 14:18:56 -0500, "Liu, Qingzhu [CAR:CF36:EXCH]"
<qliu@americasm01.nt.com> wrote:

>Hello, 
>
>I try to implement an algorithm for crc-16 reverse 
>(Poly: 1 + x + x^14 + x^16) in PIC17c756 assembly code.
>
>My questions are:
>1. What does the "Reverse' mean ?

1. You can reverse/reflect the polynomial. The reverse of a good
polynomial is also a good polynomial.

2. You can reverse the direction of the register.

3. You can reverse the final CRC.
 
>2. How is the dividing circuit looking like?

A few XORs

>3. Is there a Byte-wise CRC calculation algorithm for it?

Sure. You can tradeoff memory for speed by processing n bits at a time
using a 2^n table. A byte-wide implementation is attached below, FYI
  
>4. What is good text book about this topic?

There really are no good texts. However, there is a great paper
available from
ftp://www.rocksoft.com/clients/rocksoft/papers/crc_v3.txt
http://ww.repairfaq.org/filipg/LINK/F_crc_v3.html

Attached is the table and code fragment for a byte-wide CRC-16. This
is working code which has been successfully used in products by
multiple companies.

/*
---------------------------------------------------------------------
   The 16-bit CRC_16 is built upon the following shift-register
   reference model. Polynomial: g(x) = 1 + x^2 + x^15 + x^16

   -------------------------------------------------------------
   |(1)          |(x^2)                         (x^15)|        |(x^16)
   -->[15] [14]->X->[13] [12] [11] - - - [3] [2] [1]->X->[0]-->X
                                                               |
                                  Input data bit 0 first -------


   Leading-zero checking is NOT provided by the CRC_16 and it is used
   as follows:
     1. The crc register is initialized to ZERO.
     2. When a crc is appended, it is not inverted.
     3. When checking a good message with an appended cec, the
        register will return to its initial value of ZERO.


---------------------------------------------------------------------
*/

/* polynomial for bit-wize CRC-16 calculation, using reference model
*/

/* uint16 CRC_16_POLY = 0xa001 ; */

/* table for nibble-wize CRC-16 calculation; smaller table, slower */

/* uint16 CRC_16_TABLE [16] = {
       0x0000, 0xcc01, 0xd801, 0x1400, 0xf001, 0x3c00, 0x2800, 0xe401,
       0xa001, 0x6c00, 0x7800, 0xb401, 0x5000, 0x9c01, 0x8801, 0x4400,
       } ;
 */

/* table for byte-wize CRC-16 calculation; faster, larger table */

uint16 CRC_16_TABLE [256] = {
       0x0000, 0xc0c1, 0xc181, 0x0140, 0xc301, 0x03c0, 0x0280, 0xc241,
       0xc601, 0x06c0, 0x0780, 0xc741, 0x0500, 0xc5c1, 0xc481, 0x0440,
       0xcc01, 0x0cc0, 0x0d80, 0xcd41, 0x0f00, 0xcfc1, 0xce81, 0x0e40,
       0x0a00, 0xcac1, 0xcb81, 0x0b40, 0xc901, 0x09c0, 0x0880, 0xc841,
       0xd801, 0x18c0, 0x1980, 0xd941, 0x1b00, 0xdbc1, 0xda81, 0x1a40,
       0x1e00, 0xdec1, 0xdf81, 0x1f40, 0xdd01, 0x1dc0, 0x1c80, 0xdc41,
       0x1400, 0xd4c1, 0xd581, 0x1540, 0xd701, 0x17c0, 0x1680, 0xd641,
       0xd201, 0x12c0, 0x1380, 0xd341, 0x1100, 0xd1c1, 0xd081, 0x1040,
       0xf001, 0x30c0, 0x3180, 0xf141, 0x3300, 0xf3c1, 0xf281, 0x3240,
       0x3600, 0xf6c1, 0xf781, 0x3740, 0xf501, 0x35c0, 0x3480, 0xf441,
       0x3c00, 0xfcc1, 0xfd81, 0x3d40, 0xff01, 0x3fc0, 0x3e80, 0xfe41,
       0xfa01, 0x3ac0, 0x3b80, 0xfb41, 0x3900, 0xf9c1, 0xf881, 0x3840,
       0x2800, 0xe8c1, 0xe981, 0x2940, 0xeb01, 0x2bc0, 0x2a80, 0xea41,
       0xee01, 0x2ec0, 0x2f80, 0xef41, 0x2d00, 0xedc1, 0xec81, 0x2c40,
       0xe401, 0x24c0, 0x2580, 0xe541, 0x2700, 0xe7c1, 0xe681, 0x2640,
       0x2200, 0xe2c1, 0xe381, 0x2340, 0xe101, 0x21c0, 0x2080, 0xe041,
       0xa001, 0x60c0, 0x6180, 0xa141, 0x6300, 0xa3c1, 0xa281, 0x6240,
       0x6600, 0xa6c1, 0xa781, 0x6740, 0xa501, 0x65c0, 0x6480, 0xa441,
       0x6c00, 0xacc1, 0xad81, 0x6d40, 0xaf01, 0x6fc0, 0x6e80, 0xae41,
       0xaa01, 0x6ac0, 0x6b80, 0xab41, 0x6900, 0xa9c1, 0xa881, 0x6840,
       0x7800, 0xb8c1, 0xb981, 0x7940, 0xbb01, 0x7bc0, 0x7a80, 0xba41,
       0xbe01, 0x7ec0, 0x7f80, 0xbf41, 0x7d00, 0xbdc1, 0xbc81, 0x7c40,
       0xb401, 0x74c0, 0x7580, 0xb541, 0x7700, 0xb7c1, 0xb681, 0x7640,
       0x7200, 0xb2c1, 0xb381, 0x7340, 0xb101, 0x71c0, 0x7080, 0xb041,
       0x5000, 0x90c1, 0x9181, 0x5140, 0x9301, 0x53c0, 0x5280, 0x9241,
       0x9601, 0x56c0, 0x5780, 0x9741, 0x5500, 0x95c1, 0x9481, 0x5440,
       0x9c01, 0x5cc0, 0x5d80, 0x9d41, 0x5f00, 0x9fc1, 0x9e81, 0x5e40,
       0x5a00, 0x9ac1, 0x9b81, 0x5b40, 0x9901, 0x59c0, 0x5880, 0x9841,
       0x8801, 0x48c0, 0x4980, 0x8941, 0x4b00, 0x8bc1, 0x8a81, 0x4a40,
       0x4e00, 0x8ec1, 0x8f81, 0x4f40, 0x8d01, 0x4dc0, 0x4c80, 0x8c41,
       0x4400, 0x84c1, 0x8581, 0x4540, 0x8701, 0x47c0, 0x4680, 0x8641,
       0x8201, 0x42c0, 0x4380, 0x8341, 0x4100, 0x81c1, 0x8081, 0x4040,
       } ;

    if ( CRC_16_CRC == algorithm )
    {
        /* do NOT invert crcReg, but initialize to ZERO */
        temp = *crcReg ;

        /* generate the CRC over the entire input */
        i = partInLen ;
        while ( i-- )
        {
            index = ( temp ^ (uint32)*partIn++ ) & 0xff ;
            temp = ( ( temp >> 8 ) ^ CRC_16_TABLE [ index ] ) & 0xffff
;
        }

        /* do NOT invert output to crcReg for appending */
        *crcReg = temp ;

        /* check result for return code */
        if ( temp )
            return ( 1 ) ;      /* non-ZERO residue is BAD  */
        return ( 0 ) ;          /*     ZERO residue is GOOD */
    }


doug



Subject: Re: CRC-16 Reverse Algorithm ?
Date: Fri, 25 Feb 2000 21:33:19 GMT
From: Anne & Lynn Wheeler <lynn@adcomsys.net>
Message-ID: <uema1ge7s.fsf@mail.adcomsys.net>
References: <38B6D5A0.BCC88956@americasm01.nt.com>
Newsgroups: sci.crypt
Lines: 11


one of the interesting things in CRC16 chips in the '70s where
computer boxes using it for bit error detection and RF modem boxes
using the same chip for eliminating runs of zeros. There was strange
situation in some <college campus> RF/broadband systems which were
getting undected transmission errors because of unfortunate
interaction of using the same CRC16 recursively.

-- 
Anne & Lynn Wheeler   | lynn@adcomsys.net, lynn@garlic.com
 http://www.garlic.com/~lynn/ http://www.adcomsys.net/lynn/


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Fri, 25 Feb 2000 23:34:15 GMT
From: p27604@email.mot.com (Doug Stell)
Message-ID: <38b710e3.597381318@schbbs.mot.com>
References: <uema1ge7s.fsf@mail.adcomsys.net>
Newsgroups: sci.crypt
Lines: 18

On Fri, 25 Feb 2000 21:33:19 GMT, Anne & Lynn Wheeler
<lynn@adcomsys.net> wrote:
>
>one of the interesting things in CRC16 chips in the '70s where
>computer boxes using it for bit error detection and RF modem boxes
>using the same chip for eliminating runs of zeros. There was strange
>situation in some <college campus> RF/broadband systems which were
>getting undected transmission errors because of unfortunate
>interaction of using the same CRC16 recursively.

CRCs that are initialized to all zeros (CRC-16 is one of these) can
not detect the addition or deletion of leading zeros. Likewise, CRCs
that are initialized to all ones can not not detect the addition or
deletion of leading ones. That's why some CRCs initialize to some
known, fixed value that is neither all ones nor all zeros.

doug



Subject: Re: CRC-16 Reverse Algorithm ?
Date: Sat, 26 Feb 2000 20:41:34 -0800
From: "Marty" <anon@nowhere.edu>
Message-ID: <#txNt0Ng$GA.262@cpmsnbbsa03>
References: <38b710e3.597381318@schbbs.mot.com>
Newsgroups: sci.crypt
Lines: 32


Doug Stell <p27604@email.mot.com> wrote in message
news:38b710e3.597381318@schbbs.mot.com...
> On Fri, 25 Feb 2000 21:33:19 GMT, Anne & Lynn Wheeler
> <lynn@adcomsys.net> wrote:
> >
> >one of the interesting things in CRC16 chips in the '70s where
> >computer boxes using it for bit error detection and RF modem boxes
> >using the same chip for eliminating runs of zeros. There was strange
> >situation in some <college campus> RF/broadband systems which were
> >getting undected transmission errors because of unfortunate
> >interaction of using the same CRC16 recursively.
>
> CRCs that are initialized to all zeros (CRC-16 is one of these) can
> not detect the addition or deletion of leading zeros. Likewise, CRCs
> that are initialized to all ones can not not detect the addition or
> deletion of leading ones. That's why some CRCs initialize to some
> known, fixed value that is neither all ones nor all zeros.
>
> doug
>


Inserted and/or deleted ones at the start of a ones initialized CRC are
indeed detected.
It's also a good idea to negate the CRC at the end for detecting appended
zeros.

-Marty





Subject: Re: CRC-16 Reverse Algorithm ?
Date: 26 Feb 2000 20:28:25 -0800
From: daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner)
Message-ID: <89a959$to6$1@blowfish.isaac.cs.berkeley.edu>
References: <#txNt0Ng$GA.262@cpmsnbbsa03>
Newsgroups: sci.crypt
Lines: 33

In article <#txNt0Ng$GA.262@cpmsnbbsa03>, Marty <anonx@nowhere.edu> wrote:
> Inserted and/or deleted ones at the start of a ones initialized CRC are
> indeed detected.

Did you mean "not detected"?  I thought Doug Stell got it right.

Surely I'm missing something.  We know that, when the CRC uses an
primitive polynomial, the all-ones state 111..1 goes to 11..10
after clocking it once [*].  In other words, if initialize the CRC
to all-ones and feed in a zero bit, we get the new state 11..10.
But now we can imagine initializing it to the all-ones state and
feeding in a one bit; this complements the feedback tap, and so
where a zero was previously fed in, now a one will be fed in, and
thus the new state will be 11..11, always.

Consequently, if the CRC is initialized to the all-ones state, we
can insert as many one bits at the start of the message as we like,
and it won't affect the final result.

(It may be easier to see this by symmetry: a corresponding property
holds for the all-zeros state and prepending zero bits; and everything
is linear, so you can just complement everything in sight and by
symmetry the property will still hold.)

Where did I go wrong?  What am I missing?  I'm confused.


[*] Proof: Obvious.  Treat it as a free-running LFSR.  There are only
    two states it can go to, i.e., 11..11 and 11..10.  If it goes to
    the former, then we have a cycle of length, so the LFSR isn't
    full-period, and thus the polynomial can't be primitive, contradiction.
    When you eliminate the impossible, whatever remains must be true,
    and so surely 111..1 -> 11..10.


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Sat, 26 Feb 2000 22:23:34 -0800
From: "Marty" <anon@nowhere.edu>
Message-ID: <eu7wttOg$GA.277@cpmsnbbsa03>
References: <89a959$to6$1@blowfish.isaac.cs.berkeley.edu>
Newsgroups: sci.crypt
Lines: 71


David A. Wagner <daw@blowfish.isaac.cs.berkeley.edu> wrote in message
news:89a959$to6$1@blowfish.isaac.cs.berkeley.edu...
> In article <#txNt0Ng$GA.262@cpmsnbbsa03>, Marty <anonx@nowhere.edu> wrote:
> > Inserted and/or deleted ones at the start of a ones initialized CRC are
> > indeed detected.
>
> Did you mean "not detected"?  I thought Doug Stell got it right.

No, I believe Doug's incorrect.

>
> Surely I'm missing something.  We know that, when the CRC uses an
> primitive polynomial, the all-ones state 111..1 goes to 11..10
> after clocking it once [*].  In other words, if initialize the CRC

First, CRC's do not ussually use a primitive polynomial. They serve
a somewhat different purpose, error detection rather than max
cyclical length.

Second, when the state of a CRC is all ones it transforms to a non
zero, non all ones state on the next clock. This state is independent (
except for the lsb) of the incomming data bit.  The data bit
is not xored in before the decision to toggle the other crc state bits.

> to all-ones and feed in a zero bit, we get the new state 11..10.
> But now we can imagine initializing it to the all-ones state and
> feeding in a one bit; this complements the feedback tap, and so
> where a zero was previously fed in, now a one will be fed in, and
> thus the new state will be 11..11, always.
>
> Consequently, if the CRC is initialized to the all-ones state, we
> can insert as many one bits at the start of the message as we like,
> and it won't affect the final result.
>

Another way to see this is to note that the high speed lookup table
entry for 0xff is not 0xffffff.....00 which would be the case if the state
register remained all ones  after recieving a 0xff byte.

> (It may be easier to see this by symmetry: a corresponding property
> holds for the all-zeros state and prepending zero bits; and everything
> is linear, so you can just complement everything in sight and by
> symmetry the property will still hold.)
>

Symmetry is not applicable here. 0's act quite differntly from 1's.


> Where did I go wrong?  What am I missing?  I'm confused.
>

I hope this explanation makes sense.  Try out a CRC-32 which starts
with all ones. You will see it changes state when fed ones.



>
> [*] Proof: Obvious.  Treat it as a free-running LFSR.  There are only
>     two states it can go to, i.e., 11..11 and 11..10.  If it goes to
>     the former, then we have a cycle of length, so the LFSR isn't
>     full-period, and thus the polynomial can't be primitive,
contradiction.
>     When you eliminate the impossible, whatever remains must be true,
>     and so surely 111..1 -> 11..10.



-Marty




Subject: Re: CRC-16 Reverse Algorithm ?
Date: 27 Feb 2000 12:49:58 -0800
From: daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner)
Message-ID: <89c2lm$un2$1@blowfish.isaac.cs.berkeley.edu>
References: <eu7wttOg$GA.277@cpmsnbbsa03>
Newsgroups: sci.crypt
Lines: 26

In article <eu7wttOg$GA.277@cpmsnbbsa03>, Marty <anonx@nowhere.edu> wrote:
> Second, when the state of a CRC is all ones it transforms to a non
> zero, non all ones state on the next clock.

Yes, 111..11 -> 111..10 (if the incoming data bit is zero;
assuming Fibonacci configuration, not Galois).  Right?

> This state is independent (
> except for the lsb) of the incomming data bit.

Right.  Now, if 111..11 -> 111..10 under input bit zero, then
111..11 -> 111..10 xor 1 = 111..11 under input bit one.  No?

> Symmetry is not applicable here. 0's act quite differntly from 1's.

But if S is an initial state and M is a message, then
  CRC(complement(S),complement(M)) = complement(CRC(S,M)),
unless you do something specifically to defeat this property
(e.g., appending/prepending fixed bits to M).

Therefore, if you have a problem with leading zeros in M when S = 000...0,
you will also have a problem with leading ones in M when S = 111...1.
Or so it would appear to me.

Where have I gone wrong?  Perhaps there is some difference between
CRCs and LFSRs that I'm missing?


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Sun, 27 Feb 2000 23:05:34 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <38b9ad89.4751767@news.io.com>
References: <89c2lm$un2$1@blowfish.isaac.cs.berkeley.edu>
Newsgroups: sci.crypt
Lines: 53


On 27 Feb 2000 12:49:58 -0800, in
<89c2lm$un2$1@blowfish.isaac.cs.berkeley.edu>, in sci.crypt
daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner) wrote:

>In article <eu7wttOg$GA.277@cpmsnbbsa03>, Marty <anonx@nowhere.edu> wrote:
>> Second, when the state of a CRC is all ones it transforms to a non
>> zero, non all ones state on the next clock.
>
>Yes, 111..11 -> 111..10 (if the incoming data bit is zero;
>assuming Fibonacci configuration, not Galois).  Right?

Right.


>> This state is independent (
>> except for the lsb) of the incomming data bit.
>
>Right.  Now, if 111..11 -> 111..10 under input bit zero, then
>111..11 -> 111..10 xor 1 = 111..11 under input bit one.  No?

No.  

In a shift-left CRC, 0's always shift in, and this is independent of
the data value.  the only way the rightmost bit gets set to a 1 is
from the poly add.  


>> Symmetry is not applicable here. 0's act quite differntly from 1's.
>
>But if S is an initial state and M is a message, then
>  CRC(complement(S),complement(M)) = complement(CRC(S,M)),
>unless you do something specifically to defeat this property
>(e.g., appending/prepending fixed bits to M).

No.


>Therefore, if you have a problem with leading zeros in M when S = 000...0,
>you will also have a problem with leading ones in M when S = 111...1.
>Or so it would appear to me.
>
>Where have I gone wrong?  Perhaps there is some difference between
>CRCs and LFSRs that I'm missing?

There is no real difference.  Certainly a CRC has a data input that
LFSR's do not have.  I suppose what you are missing is a understanding
of the CRC algorithm.  

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: CRC-16 Reverse Algorithm ?
Date: 27 Feb 2000 15:25:11 -0800
From: daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner)
Message-ID: <89cbon$v9s$1@blowfish.isaac.cs.berkeley.edu>
References: <38b9ad89.4751767@news.io.com>
Newsgroups: sci.crypt
Lines: 32

In article <38b9ad89.4751767@news.io.com>, Terry Ritter <ritter@io.com> wrote:
> daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner) wrote:
> >Yes, 111..11 -> 111..10 (if the incoming data bit is zero;
> >assuming Fibonacci configuration, not Galois).  Right?
> 
> Right.
> 
> >Right.  Now, if 111..11 -> 111..10 under input bit zero, then
> >111..11 -> 111..10 xor 1 = 111..11 under input bit one.  No?
> 
> No.  
> 
> In a shift-left CRC, 0's always shift in, and this is independent of
> the data value.  the only way the rightmost bit gets set to a 1 is
> from the poly add.  

Ok.  That looks like the source of the confusion, then.  Thanks.

See my earlier quoted comments -- I was assuming we were talking
about a Fibonacci configuration (what CRC folks seem to call "forward"
CRC), rather than a Galois configuration (what CRC folks seem to be
calling a "reverse" CRC, if I understand correctly).  This seems to
be where we diverged.

As far as I can see, it remains true that the all-ones state is bad
for Fibonacci ("forward"?) if you xor the incoming data bit along with
the feedback taps to get your new state-bit.  But the poster was
apparently talking about Galois / "reverse" configuration.

I think.

Right?


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Mon, 28 Feb 2000 04:33:22 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <38b9fa75.5391942@news.io.com>
References: <89cbon$v9s$1@blowfish.isaac.cs.berkeley.edu>
Newsgroups: sci.crypt
Lines: 62


On 27 Feb 2000 15:25:11 -0800, in
<89cbon$v9s$1@blowfish.isaac.cs.berkeley.edu>, in sci.crypt
daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner) wrote:

>In article <38b9ad89.4751767@news.io.com>, Terry Ritter <ritter@io.com> wrote:
>> daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner) wrote:
>> >Yes, 111..11 -> 111..10 (if the incoming data bit is zero;
>> >assuming Fibonacci configuration, not Galois).  Right?
>> 
>> Right.
>> 
>> >Right.  Now, if 111..11 -> 111..10 under input bit zero, then
>> >111..11 -> 111..10 xor 1 = 111..11 under input bit one.  No?
>> 
>> No.  
>> 
>> In a shift-left CRC, 0's always shift in, and this is independent of
>> the data value.  the only way the rightmost bit gets set to a 1 is
>> from the poly add.  
>
>Ok.  That looks like the source of the confusion, then.  Thanks.
>
>See my earlier quoted comments -- I was assuming we were talking
>about a Fibonacci configuration (what CRC folks seem to call "forward"
>CRC), rather than a Galois configuration (what CRC folks seem to be
>calling a "reverse" CRC, if I understand correctly).  This seems to
>be where we diverged.

Yeah.  We "diverged" when some of us were talking about CRC, and
others of us were not.


>As far as I can see, it remains true that the all-ones state is bad
>for Fibonacci ("forward"?) if you xor the incoming data bit along with
>the feedback taps to get your new state-bit.  

Obviously, that is not CRC.


>But the poster was
>apparently talking about Galois / "reverse" configuration.

I know quite a few different various configurations for CRC.  But the
CRC result is unambiguous.  

CRC does use LFSR operations.  But not all forms of LFSR are CRC.  I
don't know whether we can do a CRC in the other LFSR form.  Maybe we
can, but we had better get the same result from it.  Which means that
it had better detect both leading 1's and 0's.  


>I think.
>
>Right?

No.

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Sun, 27 Feb 2000 20:57:23 -0800
From: "Marty" <anon@nowhere.edu>
Message-ID: <#IF5Piag$GA.57@cpmsnbbsa04>
References: <38b9fa75.5391942@news.io.com>
Newsgroups: sci.crypt
Lines: 74

Terry, You gave a better explanation than I had. David is persistent, maybe
at the
end of it all he will be more enlightened.

-Marty


Terry Ritter <ritter@io.com> wrote in message
news:38b9fa75.5391942@news.io.com...
>
> On 27 Feb 2000 15:25:11 -0800, in
> <89cbon$v9s$1@blowfish.isaac.cs.berkeley.edu>, in sci.crypt
> daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner) wrote:
>
> >In article <38b9ad89.4751767@news.io.com>, Terry Ritter <ritter@io.com>
wrote:
> >> daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner) wrote:
> >> >Yes, 111..11 -> 111..10 (if the incoming data bit is zero;
> >> >assuming Fibonacci configuration, not Galois).  Right?
> >>
> >> Right.
> >>
> >> >Right.  Now, if 111..11 -> 111..10 under input bit zero, then
> >> >111..11 -> 111..10 xor 1 = 111..11 under input bit one.  No?
> >>
> >> No.
> >>
> >> In a shift-left CRC, 0's always shift in, and this is independent of
> >> the data value.  the only way the rightmost bit gets set to a 1 is
> >> from the poly add.
> >
> >Ok.  That looks like the source of the confusion, then.  Thanks.
> >
> >See my earlier quoted comments -- I was assuming we were talking
> >about a Fibonacci configuration (what CRC folks seem to call "forward"
> >CRC), rather than a Galois configuration (what CRC folks seem to be
> >calling a "reverse" CRC, if I understand correctly).  This seems to
> >be where we diverged.
>
> Yeah.  We "diverged" when some of us were talking about CRC, and
> others of us were not.
>
>
> >As far as I can see, it remains true that the all-ones state is bad
> >for Fibonacci ("forward"?) if you xor the incoming data bit along with
> >the feedback taps to get your new state-bit.
>
> Obviously, that is not CRC.
>
>
> >But the poster was
> >apparently talking about Galois / "reverse" configuration.
>
> I know quite a few different various configurations for CRC.  But the
> CRC result is unambiguous.
>
> CRC does use LFSR operations.  But not all forms of LFSR are CRC.  I
> don't know whether we can do a CRC in the other LFSR form.  Maybe we
> can, but we had better get the same result from it.  Which means that
> it had better detect both leading 1's and 0's.
>
>
> >I think.
> >
> >Right?
>
> No.
>
> ---
> Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
> Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM
>




Subject: Re: CRC-16 Reverse Algorithm ?
Date: Mon, 28 Feb 2000 07:42:39 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <38ba2674.423049@news.io.com>
References: <#IF5Piag$GA.57@cpmsnbbsa04>
Newsgroups: sci.crypt
Lines: 14


On Sun, 27 Feb 2000 20:57:23 -0800, in <#IF5Piag$GA.57@cpmsnbbsa04>,
in sci.crypt "Marty" <anon@nowhere.edu> wrote:

>Terry, You gave a better explanation than I had. David is persistent, maybe
>at the
>end of it all he will be more enlightened.

I am not overly hopeful.

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Sun, 27 Feb 2000 22:56:30 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <38b9ab7e.4228600@news.io.com>
References: <eu7wttOg$GA.277@cpmsnbbsa03>
Newsgroups: sci.crypt
Lines: 126


What a mess!

On Sat, 26 Feb 2000 22:23:34 -0800, in <eu7wttOg$GA.277@cpmsnbbsa03>,
in sci.crypt "Marty" <anon@nowhere.edu> wrote:

>David A. Wagner <daw@blowfish.isaac.cs.berkeley.edu> wrote in message
>news:89a959$to6$1@blowfish.isaac.cs.berkeley.edu...
>> In article <#txNt0Ng$GA.262@cpmsnbbsa03>, Marty <anonx@nowhere.edu> wrote:
>> > Inserted and/or deleted ones at the start of a ones initialized CRC are
>> > indeed detected.
>>
>> Did you mean "not detected"?  I thought Doug Stell got it right.
>
>No, I believe Doug's incorrect.

Doug is wrong.  CRC *does* indeed detect every 1-bit at the start of a
message.  Missing 1's *ARE* detected.  Extra 1's *ARE* detected.
Missing and extra 0's are also detected.  


>> Surely I'm missing something.  We know that, when the CRC uses an
>> primitive polynomial, the all-ones state 111..1 goes to 11..10
>> after clocking it once [*].  In other words, if initialize the CRC
>
>First, CRC's do not ussually use a primitive polynomial. 

Generally false.  Modern CRC's normally are primitive.  CRC32 is
primitive.  CRC-16 and CRC-CCITT are composite with a factor of 11,
which is the same as parity error detection.  Systems which previously
used parity were thus in no way disadvantaged by using CRC.  


>They serve
>a somewhat different purpose, error detection rather than max
>cyclical length.
>
>Second, when the state of a CRC is all ones it transforms to a non
>zero, non all ones state on the next clock. This state is independent (
>except for the lsb) of the incomming data bit.  The data bit
>is not xored in before the decision to toggle the other crc state bits.

True, but confusing.  Assuming a shift-left CRC, what happens is that
new zero bits are shifted in from the right, and if the data bits are
the same as the shifted out bit (they will be with all-1's CRC and
data), then the poly is not added in.  So if we have all-1's, we just
accumulate 0's until the CRC becomes zero, after which data 1's imply
that the poly will be added in.  

for shift left:

if (msb(crc) = databit)
   crc = crc << 1;
else
   crc = (crc << 1) ^ poly;



>> to all-ones and feed in a zero bit, we get the new state 11..10.

Correct, assuming a left-shift computation.

(I think this is from Doug.)
>> But now we can imagine initializing it to the all-ones state and
>> feeding in a one bit; this complements the feedback tap, and so
>> where a zero was previously fed in, now a one will be fed in, and
>> thus the new state will be 11..11, always.

False.  Assuming a shift-left CRC starting at "all 1's," we get a new
zero with every shift and every data bit.  Since msb(crc) = databit,
what we don't get is the poly addition, and that is the way the
rightmost bit gets set, so it stays zero.  


>> Consequently, if the CRC is initialized to the all-ones state, we
>> can insert as many one bits at the start of the message as we like,
>> and it won't affect the final result.

FALSE.  Try it.  


>Another way to see this is to note that the high speed lookup table
>entry for 0xff is not 0xffffff.....00 which would be the case if the state
>register remained all ones  after recieving a 0xff byte.

Correct.  CRC32 with crc = 0xffffffff and data = 0xff produces
crc = 0xffffff00


>> (It may be easier to see this by symmetry: a corresponding property
>> holds for the all-zeros state and prepending zero bits; and everything
>> is linear, so you can just complement everything in sight and by
>> symmetry the property will still hold.)
>>
>
>Symmetry is not applicable here. 0's act quite differntly from 1's.

Well, if we have "all-0's" and data of 0's we will accumulate more
zeros, which will not detect leading zeros.  The whole point of
setting the CRC register to all-1's prior to CRC operations is to
detect missing bits.  I suppose the symmetry breaks because the
computation is crc = crc*2 instead of crc = crc*2 + 1.


>> Where did I go wrong?  What am I missing?  I'm confused.

>I hope this explanation makes sense.  Try out a CRC-32 which starts
>with all ones. You will see it changes state when fed ones.

Exactly the right answer:  Try it.  


>> [*] Proof: Obvious.  Treat it as a free-running LFSR.  There are only
>>     two states it can go to, i.e., 11..11 and 11..10.  If it goes to
>>     the former, then we have a cycle of length, so the LFSR isn't
>>     full-period, and thus the polynomial can't be primitive,
>contradiction.
>>     When you eliminate the impossible, whatever remains must be true,
>>     and so surely 111..1 -> 11..10.

Alas, CRC32 is primitive.  Now, about that proof....

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: CRC-16 Reverse Algorithm ?
Date: 27 Feb 2000 15:08:08 -0800
From: daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner)
Message-ID: <89caoo$v8j$1@blowfish.isaac.cs.berkeley.edu>
References: <38b9ab7e.4228600@news.io.com>
Newsgroups: sci.crypt
Lines: 29

In article <38b9ab7e.4228600@news.io.com>, Terry Ritter <ritter@io.com> wrote:
> Missing and extra 0's are also detected.  

How can this be?  Take a vanilla CRC initialized to the all-zeros state.
Clock in a single zero bit.  The result will still be the all-zeros
state.  Clock in as many zero bits as you like, the state won't change.

Consequently, the CRC can't tell the difference between the message
0^j || M (i.e., j zeros followed by M) and the message 0^k || M.

Sure, in a real implementation you should add extra precautions to eliminate
this property.  For instance, simply avoid starting in the all-zeros state.
Or start in the all-zeros state but always prepend a one-bit to the message
(a roughly equivalent countermeasure).  But if you do nothing, it seems you
get into trouble.  And I'm talking about the vanilla "do-nothing" case.

Consider the pseudocode you suggested:
  if (msb(crc) = databit)
     crc = crc << 1;
  else
     crc = (crc << 1) ^ poly;
If you run this code fragment with crc=0 and databit=0, you will execute the
statement `crc = crc << 1;', and since 0 << 1 = 0, we will still have crc=0
after the code fragment.  This supports what I said above.

There, I tried it.  Now, where did I go wrong?

Ignore the all-ones stuff for the moment;
Are you *sure* I'm wrong about this all-zeros property?

Bytes: 827

Subject: Re: CRC-16 Reverse Algorithm ?
Date: Sun, 27 Feb 2000 15:58:59 -0800
From: lordcow77 <london222NOloSPAM@netzero.com.invalid>
Message-ID: <0d262a48.7c958eec@usw-ex0101-007.remarq.com>
References: <89caoo$v8j$1@blowfish.isaac.cs.berkeley.edu>
Newsgroups: sci.crypt
Lines: 20

In article <89caoo$v8j$1@blowfish.isaac.cs.berkeley.edu>,
daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner) wrote:
>Ignore the all-ones stuff for the moment;
>Are you *sure* I'm wrong about this all-zeros property?
>
>

You're right if the CRC register starts with all zeroes, but IF
the register is initially initialized with all ones, the CRC
algorithm will detect a prepended string ones OR zeroes. Just
run the algorithm through a couple of times (I find the byte-
wise table method more clear for intiutive expression of these
concepts).
REG=0xffffffff
After we push a 0x00 byte into the register, we get
REG=0xffffff00 and so on until REG=0x00000000.


* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!



Subject: Re: CRC-16 Reverse Algorithm ?
Date: Mon, 28 Feb 2000 04:33:42 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <38b9fa95.5423831@news.io.com>
References: <89caoo$v8j$1@blowfish.isaac.cs.berkeley.edu>
Newsgroups: sci.crypt
Lines: 83


On 27 Feb 2000 15:08:08 -0800, in
<89caoo$v8j$1@blowfish.isaac.cs.berkeley.edu>, in sci.crypt
daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner) wrote:

>In article <38b9ab7e.4228600@news.io.com>, Terry Ritter <ritter@io.com> wrote:
>> Missing and extra 0's are also detected.  
>
>How can this be?  Take a vanilla CRC initialized to the all-zeros state.
>Clock in a single zero bit.  The result will still be the all-zeros
>state.  Clock in as many zero bits as you like, the state won't change.

If the CRC register is initialized with zeros, it cannot detect
leading zeros in the data.  That problem was detected early in CRC
use, and from that time it became common to init the CRC register as
ones.  When this is done, CRC can detect both leading zeros and
leading ones.


Here is what you said, which defines the current issue:

>David A. Wagner <daw@blowfish.isaac.cs.berkeley.edu> wrote in message
>news:89a959$to6$1@blowfish.isaac.cs.berkeley.edu...
>> In article <#txNt0Ng$GA.262@cpmsnbbsa03>, Marty <anonx@nowhere.edu> wrote:
>> > Inserted and/or deleted ones at the start of a ones initialized CRC are
>> > indeed detected.
>>
>> Did you mean "not detected"?  I thought Doug Stell got it right.


And here is the statement which you thought was right:

On Fri, 25 Feb 2000 23:34:15 GMT, in
<38b710e3.597381318@schbbs.mot.com>, in sci.crypt p27604@email.mot.com
(Doug Stell) wrote:

>[...]
>CRCs that are initialized to all zeros (CRC-16 is one of these) can
>not detect the addition or deletion of leading zeros.  Likewise, CRCs
>that are initialized to all ones can not not detect the addition or
>deletion of leading ones. 

The last statement was false.  


>Consequently, the CRC can't tell the difference between the message
>0^j || M (i.e., j zeros followed by M) and the message 0^k || M.
>
>Sure, in a real implementation you should add extra precautions to eliminate
>this property.  For instance, simply avoid starting in the all-zeros state.
>Or start in the all-zeros state but always prepend a one-bit to the message
>(a roughly equivalent countermeasure).  But if you do nothing, it seems you
>get into trouble.  And I'm talking about the vanilla "do-nothing" case.

The CRC register has to be initialized to *some* known value.  The
vanilla case is to initialize the CRC to all-1's.


>Consider the pseudocode you suggested:
>  if (msb(crc) = databit)
>     crc = crc << 1;
>  else
>     crc = (crc << 1) ^ poly;
>If you run this code fragment with crc=0 and databit=0, you will execute the
>statement `crc = crc << 1;', and since 0 << 1 = 0, we will still have crc=0
>after the code fragment.  This supports what I said above.
>
>There, I tried it.  Now, where did I go wrong?

By not actually trying some working CRC code after thinking about it.



>Ignore the all-ones stuff for the moment;
>Are you *sure* I'm wrong about this all-zeros property?

What you said originally was wrong.  I'm not sure what you have
changed to now.  

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: CRC-16 Reverse Algorithm ?
Date: 27 Feb 2000 15:20:51 -0800
From: daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner)
Message-ID: <89cbgj$v98$1@blowfish.isaac.cs.berkeley.edu>
References: <38b9ab7e.4228600@news.io.com>
Newsgroups: sci.crypt
Lines: 35

In article <38b9ab7e.4228600@news.io.com>, Terry Ritter <ritter@io.com> wrote:
> if (msb(crc) = databit)
>    crc = crc << 1;
> else
>    crc = (crc << 1) ^ poly;

Ok, let's try this.
I will show a initial state which allows you to prepend as many
one bits as you like without changing the final CRC hash.

(But I could well have made mistakes in my calculations, so
check my work...  Does it look right to you?)

Suppose, before we execute this code fragment, we have
  crc == poly ^ (poly<<1) ^ (poly<<2) ^ (poly<<3) ^ ...
and msb(crc) != databit.  The 'else' branch will be executed,
and we will compute 
  crc' := ((crc                                    )<<1) ^ poly
        = ((poly      ^ (poly<<1) ^ (poly<<2) ^ ...)<<1) ^ poly
        = (((poly<<1) ^ (poly<<2) ^ (poly<<3) ^ ...)     ^ poly
	= poly ^ (poly<<1) ^ (poly<<2) ^ (poly<<3)^ ...
so the state of the CRC before and after the code fragment will
remain unchanged.

The only thing to check now is that the requirement
msb(crc) != databit can be satisfied when databit == 1.
The result: It can.

Why?  It's not too hard to see that msb(crc) == parity(poly).
Furthermore, whenever the CRC uses an primitive polynomial (or
even just an irreducible one), we have parity(poly) == 0.
(Why?  If parity(poly) == 1, then x+1 divides the polynomial,
and hence the polynomial can't be irreducible or, for that matter,
primitive.)  Hence msb(crc) == 0 always, and so if databit == 1,
we will automatically have msb(crc) != databit.


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Mon, 28 Feb 2000 04:34:00 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <38b9faa9.5443638@news.io.com>
References: <89cbgj$v98$1@blowfish.isaac.cs.berkeley.edu>
Newsgroups: sci.crypt
Lines: 45


On 27 Feb 2000 15:20:51 -0800, in
<89cbgj$v98$1@blowfish.isaac.cs.berkeley.edu>, in sci.crypt
daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner) wrote:

>In article <38b9ab7e.4228600@news.io.com>, Terry Ritter <ritter@io.com> wrote:
>> if (msb(crc) = databit)
>>    crc = crc << 1;
>> else
>>    crc = (crc << 1) ^ poly;
>
>Ok, let's try this.
>I will show a initial state which allows you to prepend as many
>one bits as you like without changing the final CRC hash.

But that is not the issue.  

You have agreed with the statement that CRC could not detect faults in
*leading* 1's.  

Here is the original statement:

On Fri, 25 Feb 2000 23:34:15 GMT, in
<38b710e3.597381318@schbbs.mot.com>, in sci.crypt p27604@email.mot.com
(Doug Stell) wrote:

>[...]
>CRCs that are initialized to all zeros (CRC-16 is one of these) can
>not detect the addition or deletion of leading zeros.  Likewise, CRCs
>that are initialized to all ones can not not detect the addition or
>deletion of leading ones. 

The last statement is false.  And since modern CRC's do set the CRC
register to 1's they do indeed detect both leading 0's and 1's.  End
of story.


Are you now saying that if, mid operations, the CRC register has some
particular state, it will not detect extra 1's?  While that may be an
interesting idea, it is a completely different issue.  

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: CRC-16 Reverse Algorithm ?
Date: 29 Feb 2000 12:50:00 -0800
From: daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner)
Message-ID: <89hbdo$8d9$1@blowfish.isaac.cs.berkeley.edu>
References: <38b9faa9.5443638@news.io.com>
Newsgroups: sci.crypt
Lines: 18

In article <38b9faa9.5443638@news.io.com>, Terry Ritter <ritter@io.com> wrote:
> Are you now saying that if, mid operations, the CRC register has some
> particular state, it will not detect extra 1's?  While that may be an
> interesting idea, it is a completely different issue.  

Yup.

But this state I'm describing, in some vague sense that I don't know how
to put my finger on, feels like "just" a renaming of the all-ones state.

In other words, I *suspect* that, no matter which configuration of CRC you
use, there will usually be some initial state which doesn't detect leading
ones.  The exact value of that initial state depends on the configuration.
If you pick one particular configuration, that bad initial state is all-ones;
if you pick another one, that bad initial state is something else.

But I don't have any evidence for this.  These are just squishy, unscientific,
unproven, groundless guesses.


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Tue, 29 Feb 2000 22:14:02 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <38bc4456.1150245@news.io.com>
References: <89hbdo$8d9$1@blowfish.isaac.cs.berkeley.edu>
Newsgroups: sci.crypt
Lines: 41


On 29 Feb 2000 12:50:00 -0800, in
<89hbdo$8d9$1@blowfish.isaac.cs.berkeley.edu>, in sci.crypt
daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner) wrote:

>In article <38b9faa9.5443638@news.io.com>, Terry Ritter <ritter@io.com> wrote:
>> Are you now saying that if, mid operations, the CRC register has some
>> particular state, it will not detect extra 1's?  While that may be an
>> interesting idea, it is a completely different issue.  
>
>Yup.
>
>But this state I'm describing, in some vague sense that I don't know how
>to put my finger on, feels like "just" a renaming of the all-ones state.
>
>In other words, I *suspect* that, no matter which configuration of CRC you
>use, there will usually be some initial state which doesn't detect leading
>ones.  The exact value of that initial state depends on the configuration.
>If you pick one particular configuration, that bad initial state is all-ones;
>if you pick another one, that bad initial state is something else.
>
>But I don't have any evidence for this.  These are just squishy, unscientific,
>unproven, groundless guesses.

This just doesn't make any sense to me.  We don't pick just any random
initial state for CRC's.  In data communications at least, both ends
have to pick the same starting state.  In general, that starting state
is all-ones, which does detect both leading zeros and ones.  

The first uses of the first-generation 16-bit CRC's were initialized
to all-zeros, so a zeros init might have some support.  But I am aware
of no support for a random initialization value.  

So even if the guess is true, it simply has nothing to do with real
CRC initialization.  It is a theoretical problem which never arises
because we have a standard init value which does not do that.  

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: CRC-16 Reverse Algorithm ?
Date: 29 Feb 2000 14:07:57 -0800
From: daw@blowfish.isaac.cs.berkeley.edu (David A. Wagner)
Message-ID: <89hfvt$8kq$1@blowfish.isaac.cs.berkeley.edu>
References: <38bc4456.1150245@news.io.com>
Newsgroups: sci.crypt
Lines: 5

Ok.  It sounds like you are saying that noone ever uses any of the CRC
configurations that could have a problem with the all-ones initialization
state.  Good.

Sorry for the confusion.  It was my fault.  Thanks for explaining.


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Wed, 01 Mar 2000 03:49:07 GMT
From: "Douglas A. Gwyn" <DAGwyn@null.net>
Message-ID: <38BC9327.761B4F22@null.net>
References: <89hfvt$8kq$1@blowfish.isaac.cs.berkeley.edu>
Newsgroups: sci.crypt
Lines: 3

The simple fact is, there are some garbles that *any* CRC scheme
will fail to detect.  One wants to design the scheme so that they
are not among the *probable* modes of garbling.


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Thu, 02 Mar 2000 11:21:43 -0500
From: Paul Koning <pkoning@lucent.com>
Message-ID: <38BE9517.D2E08850@lucent.com>
References: <2000Mar206.18.01.302@cr.yp.to>
   <38BC9327.761B4F22@null.net>
Newsgroups: sci.crypt
Lines: 60

"D. J. Bernstein" wrote:
> 
> Douglas A. Gwyn <DAGwyn@null.net> wrote:
> > The simple fact is, there are some garbles that *any* CRC scheme
> > will fail to detect.
> 
> False. If you have two different messages up to a billion bits long, for
> example, then that difference will be detected by more than 99.99999% of
> the possible CRC-64 polynomials.

Perhaps you're talking about different subjects.

As far as I know Doug is right if he meant "for any given CRC
polynomial, there are error patterns it will fail to detect".
That's obvious (it has to be true for any finite length error
detecting code, CRC or not).  

More specifically, there's the "Hamming distance" of a CRC, which
is a function of the message length, and which says what the minimum
number of bits is that you can change and have it not be detected.
For example, for CRC-32 and Ethernet sized messages, that value is 4.
(For long messages, over 11k bytes or so if I remember right, it drops
to 3, which is why it is unwise to use CRC-32 with MTUs that large.)
Those minimal undetected errors have weird patterns, though -- a few
bits scattered in very precisely picked places far from each other.

In addition, there's a burst length such that not all burst errors of
that length are detected.  For CRC-32 (again working from possibly
faulty memory) I believe that length is 33 bits.  Is it N+1 bits for
any CRC-N?

I'm interpreting Dan's comment as "for any message even a very long
one, any of the appropriate polynomials for CRC-64 will detect nearly
all possible errors".  That too fits; the rule of thumb is that long
random errors (longer than the burst length I mentioned above) are
detected in all but 2^-N cases, for CRC-N.

One interesting point is that deleted or added bits or bytes count
as an error burst from then to end of message, which is why CRC
alone is not a good mechanism if your datalink framing is capable of
truncating or concatenating messages, or adding or deleting bits or
bytes.  HDLC has this property (due to bit stuffing) as do Bisync and
Packet over Sonet (due to character escaping).  FDDI has it because
the frame terminator is fairly vulnerable, and ATM AAL-5 has it because
the end of packet indicator is just one bit and cells can easily be
lost.
In many of these, the solution used is to do a length check as well;
ATM AAL-5 indeed requires this.

	paul
!-----------------------------------------------------------------------
! Paul Koning, NI1D, D-20853
! Lucent Corporation, 50 Nagog Park, Acton, MA 01720, USA
! phone: +1 978 263 0060 ext 115, fax: +1 978 263 8386
! email: pkoning@lucent.com
! Pgp:   27 81 A9 73 A6 0B B3 BE 18 A3 BF DD 1A 59 51 75
!-----------------------------------------------------------------------
! "A system of licensing and registration is the perfect device to deny
! gun ownership to the bourgeoisie."
!       -- Vladimir Ilyich Lenin


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Fri, 03 Mar 2000 17:42:23 GMT
From: scott@helsbreth.org (Scott Nelson)
Message-ID: <38c1f557.81106473@news.inreach.com>
References: <38BE9517.D2E08850@lucent.com>
Newsgroups: sci.crypt
Lines: 20

On Thu, 02 Mar 2000, Paul Koning <pkoning@lucent.com> wrote:

[snip]
>In addition, there's a burst length such that not all burst errors of
>that length are detected.  For CRC-32 (again working from possibly
>faulty memory) I believe that length is 33 bits.  Is it N+1 bits for
>any CRC-N?
>

If you feed all possible N bit values into an N bit crc, 
(regardless of how it is initialized) you will get all possible
N bit values.  It only takes a 32 bit error burst, not 33.

Also, any crc has a pattern which will not change
when fed a 1 bit.  Therefore it is always possible 
to construct a message 2N bits long which can be 
extended indefinitely by inserting 1's after N bits,
without changing the CRC.

Scott Nelson <scott@helsbreth.org>


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Mon, 28 Feb 2000 15:33:24 GMT
From: p27604@email.mot.com (Doug Stell)
Message-ID: <38ba94ee.827800844@schbbs.mot.com>
References: <#txNt0Ng$GA.262@cpmsnbbsa03>
Newsgroups: sci.crypt
Lines: 19

On Sat, 26 Feb 2000 20:41:34 -0800, "Marty" <anon@nowhere.edu> wrote:

>
>Doug Stell <p27604@email.mot.com> wrote in message
>>
>> CRCs that are initialized to all zeros (CRC-16 is one of these) can
>> not detect the addition or deletion of leading zeros. Likewise, CRCs
>> that are initialized to all ones can not not detect the addition or
>> deletion of leading ones. That's why some CRCs initialize to some
>> known, fixed value that is neither all ones nor all zeros.
>
>Inserted and/or deleted ones at the start of a ones initialized CRC are
>indeed detected.

I am indeed wrong. A CRC initialized to all ones can detect inserted
and/or deleted leading ones.

doug



Subject: Re: CRC-16 Reverse Algorithm ?
Date: Sun, 27 Feb 2000 19:28:06 GMT
From: scott@helsbreth.org (Scott Nelson)
Message-ID: <38b97556.176088249@news.inreach.com>
References: <38B6D5A0.BCC88956@americasm01.nt.com>
Newsgroups: sci.crypt
Lines: 55

On Fri, 25 Feb 2000 14:18:56 -0500, "Liu, Qingzhu [CAR:CF36:EXCH]"
<qliu@americasm01.nt.com> wrote:

>Hello, 
>
>I try to implement an algorithm for crc-16 reverse 
>(Poly: 1 + x + x^14 + x^16) in PIC17c756 assembly code.
>
>My questions are:
>1. What does the "Reverse' mean ? 
There is no well defined meaning, however in this case I 
think it's probably referring to the "direction" of the flow;

By one definition a "normal" CRC takes the XOR of
bits 1, 14, 16 and the input bit, and rotates the result
into the register, changing at most 1 bit.
A "reverse" CRC rotates the register, takes the bottom bit, 
XOR's with the input and if the result is a 1, it XOR's 
several of the bits.
The "reverse" CRC is more common, since it's so much 
easier to implement in software.

The C code for a single bit looks like this:

 if ((crc ^ input) & 1) == 1) {
   crc = crc>>1;
 } else {
   crc = (crc>>1) ^ 0xa001;
 }

and this is usually smaller in assembly.


>2. How is the dividing circuit looking like?
not sure what you mean.

>3. Is there a Byte-wise CRC calculation algorithm for it?  
Yes.  But for the PIC, you are probably better off
implementing it straight line.  Wasting 512+ instructions
just for a few cycle improvement is unlikely to be worthwhile.

It only takes about 5 instructions to do 1 bit,
40 to do all 8, looking it up involves two computed
jumps, and a two subroutine calls, in addition to 
the XOR and the instructions wasted in the lookup tables.

>4. What is good text book about this topic?
>
It's a CD, not a book, but I think the latest microchip example 
library includes a CRC routine.  Check their web site:
http://www.microchip.com

Hope that helps.

Scott Nelson <scott@helsbreth.org>


Subject: Re: CRC-16 Reverse Algorithm ?
Date: 27 Feb 2000 20:59:54 GMT
From: klockstone@cix.compulink.co.uk
Message-ID: <89c38a$pci$1@plutonium.compulink.co.uk>
References: <38B6D5A0.BCC88956@americasm01.nt.com>
Newsgroups: sci.crypt
Lines: 27

To quote Rich Webb in a recent reply on comp.arch.embedded:

*THE* reference site for programmers looking to understand and write CRC
algorithms is:

http://www.ross.net/crc/crcpaper.html

Look for the paper "A Painless Guide to CRC Error Detection Algorithms"
by Ross Williams. It explains how they work, gives operating parameters
for several standard CRCs, and includes working portable C source for a
model CRC implementation.

The model code is slow since it's not hard-wired for a particular
endian-ness or CRC flavor and is not table-driven, but this enables it
to be used with many different CRC divisors and reflection options. VERY
handy to be able to check the results of that tight assembly code
against a known good result.

The paper does include a table generation function and guidance on how
to implement a tight, fast table-driven CRC engine.


Keith
 http://www.cix.co.uk/~klockstone
 ------------------------
 'Unwise a grave for Arthur'
 -- The Black Book of Carmarthen


Subject: Re: CRC-16 Reverse Algorithm ?
Date: Mon, 13 Mar 2000 16:32:04 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <38cd17b5.103797@news.io.com>
References: <38B6D5A0.BCC88956@americasm01.nt.com>
Newsgroups: sci.crypt
Lines: 40


On Fri, 25 Feb 2000 14:18:56 -0500, in
<38B6D5A0.BCC88956@americasm01.nt.com>, in sci.crypt "Liu, Qingzhu
[CAR:CF36:EXCH]" <qliu@americasm01.nt.com> wrote:

>Hello, 
>
>I try to implement an algorithm for crc-16 reverse 
>(Poly: 1 + x + x^14 + x^16) in PIC17c756 assembly code.
>
>My questions are:
>1. What does the "Reverse' mean ? 

"Reverse" means that the poly is intended to operate on bit-reversed
data.  This was needed for large 60's-style computer reel-to-reel
tapes where the data could be read from either direction.  

"Reverse" also means that the CRC poly is reversed from the original:
If we write CRC-16 as 0x18005, CRC-16 reverse will be 0x14003, which
is the same bit pattern read from opposite directions.  

The way the reverse operation works depends upon the way CRC is
developed and deposited; this is the forward direction.  The most
trivial way is to init the CRC register as zeros, CRC the data, then
deposit the CRC result, MS byte first.  Then, when we traverse the
data plus the stored CRC result, the CRC result will be zeros.

To traverse the data in reverse direction, we again init the CRC
register as zeros, use the inverse CRC polynomial, and process the
data bits in reverse (starting with the lsb of the saved CRC value).
Again, the CRC result will be zeros.  

It is also possible to init as ones, and to save the complement of the
CRC value (so the result will be non-zero), and check in reverse, but
doing that is more complicated.  

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM

Cache-Post-Path: dublin!unknown@gate.ncipher.com

Subject: Re: Small hash checksum
Date: 2 Aug 2000 14:39:00 GMT
From: mdw@nsict.org (Mark Wooding)
Message-ID: <slrn8ogck4.26s.mdw@mull.ncipher.com>
References: <002c01bffc79$906d5570$31a026c3@SIBER>
   <20000726125715.03232.00000373@ng-bg1.aol.com>
Newsgroups: sci.crypt
Lines: 12

Ilya O. Levin <siber@email.kg> wrote:
> > Just use a 32 bit CRC they have the requirements you need.
> > Table driven implementations are very fast.
> 
> Definitely, it'll faster then take a piece of a big hash.

Not necessarily.  I remember testing CRC32 and MD4 on large files, and
MD4 won rather convincingly.  Both implementations were written in ARM
assembler, though I'd not be surprised if similar results held for other
architectures.

-- [mdw]


Subject: Re: Small hash checksum
Date: Wed, 02 Aug 2000 15:32:29 GMT
From: ritter@io.com (Terry Ritter)
Message-ID: <39883ee5.933915@news.io.com>
References: <slrn8ogck4.26s.mdw@mull.ncipher.com>
Newsgroups: sci.crypt
Lines: 25


On 2 Aug 2000 14:39:00 GMT, in <slrn8ogck4.26s.mdw@mull.ncipher.com>,
in sci.crypt mdw@nsict.org (Mark Wooding) wrote:

>Ilya O. Levin <siber@email.kg> wrote:
>> > Just use a 32 bit CRC they have the requirements you need.
>> > Table driven implementations are very fast.
>> 
>> Definitely, it'll faster then take a piece of a big hash.
>
>Not necessarily.  I remember testing CRC32 and MD4 on large files, and
>MD4 won rather convincingly.  Both implementations were written in ARM
>assembler, though I'd not be surprised if similar results held for other
>architectures.

Well, there is CRC and then there is CRC.  The fast implementation
uses a pre-computed table of 256 32-bit elements.  Then for each data
byte we have a mask, an 8-bit XOR and look-up, an 8-bit shift and a
32-bit XOR.  I'd be surprised if MD4 had an implementation which was
faster than that.

---
Terry Ritter   ritter@io.com   http://www.io.com/~ritter/
Crypto Glossary   http://www.io.com/~ritter/GLOSSARY.HTM

Cache-Post-Path: dublin!unknown@gate.ncipher.com

Subject: Re: Small hash checksum
Date: 2 Aug 2000 16:18:10 GMT
From: mdw@nsict.org (Mark Wooding)
Message-ID: <slrn8ogie1.26s.mdw@mull.ncipher.com>
References: <39883ee5.933915@news.io.com>
Newsgroups: sci.crypt
Lines: 91

Terry Ritter <ritter@io.com> wrote:
> 
> On 2 Aug 2000 14:39:00 GMT, in <slrn8ogck4.26s.mdw@mull.ncipher.com>,
> in sci.crypt mdw@nsict.org (Mark Wooding) wrote:
> 
> >Not necessarily.  I remember testing CRC32 and MD4 on large files, and
> >MD4 won rather convincingly.  Both implementations were written in ARM
> >assembler, though I'd not be surprised if similar results held for other
> >architectures.
> 
> Well, there is CRC and then there is CRC.  The fast implementation
> uses a pre-computed table of 256 32-bit elements.  Then for each data
> byte we have a mask, an 8-bit XOR and look-up, an 8-bit shift and a
> 32-bit XOR.  I'd be surprised if MD4 had an implementation which was
> faster than that.

I'm not a *complete* idiot, Terry.  I *did* implement the CRC like
that.

Let's think about this.  For a CRC, we have:

  * For each byte, a memory read (to fetch the byte), an XOR, an AND, a
    load (with shifted index; let's say that's free, because it was),
    and another XOR (and we'll say that the shift is free again).

  * Then we see whether we've finished, and branch back if we haven't.  I
    count 7 cycles per byte.

Here's the code, or at least a facsimile:

  crc32
    stmfd r13!, {r3, r14}
    mvn r0, r0
    adr r3, crc_table

  crc_loop
    cmp r1, r2
    ldrccb r14, [r1], #1
    eorcc r14, r14, r0
    andcc r14, r14, #&ff
    ldrcc r14, [r3, r14, lsl #2]
    eorcc r0, r14, r0, lsr #8
    bcc crc_loop

    mvn r0, r0
    ldmfd r13!, {r3, pc}

For MD4, we have:

  * Three rounds of processing.  Each one consists of...

  * Loading a magic constant.  Actually, the first round's constant is
    zero, so that's free. [0, 1, 1]

  * Sixteen steps, each of which does the same thing to one of 16 input
    words.  Each step consists of...

  * A word load.  I don't have enough registers to avoid this. [1, 1, 1]

  * A boolean function on three inputs.  There are three different ones
    of these, used in the different rounds.  Three-way XOR costs two
    cycles, selection costs three, and majority is a pain and costs
    five. [3, 5, 2]

  * Adding four things together and doing a rotate.  [4, 4, 4]

Adding that load together, we have 16 [8, 10, 7] + [0, 1, 1] + 3 = 405
in total.  (The three on the end is for the looping.  I neglect the
padding at the end, because it's negligible on large files.)  But that
does 64 bytes all at once.  I therefore find that it does each byte in
6.33 cycles.

A win for MD4!


Do I hear an objection at the back?  Something about loads not happening
in one cycle?  Yes, I agree that that was cheeky for me.  But I was
actually trying to favour CRC as much as I could.  Note that we actually
do 3/4 as much loading of data in MD4 -- although we do three times as
many loads (one in each round), they're of chunks four times the size.
And I'm assuming that the CRC table lives in cache all the time anyway.

Next objection, please!  Yes, I could have loaded words in the CRC.  I
think I decided that the extra logic to do this would have made life
over-complex, but it is a possiblity.  However, I still have to deal the
next byte out of an accumulator somehow, so it's not that much of an
advantage.

Next please?

-- [mdw]


Subject: Re: Small hash checksum
Date: 02 Aug 2000 12:36:16 -0400
From: stanislav shalunov <shalunov@internet2.edu>
Message-ID: <87k8dz1uyn.fsf@cain.internet2.edu>
References: <slrn8ogie1.26s.mdw@mull.ncipher.com>
Newsgroups: sci.crypt
Lines: 18

mdw@nsict.org (Mark Wooding) writes:

[Comparison of MD4 vs. CRC32.  Conclusion that CRC32 takes
7 cycles/byte, while MD4 takes 6.33 cycles/byte.  Discussion of
possible objections to the analysis of the code.]

> Next please?

But if the input is short (say 32 bytes on average), CRC would still
be faster, correct?

I've seen a technique where people would use MD5 on Message-IDs,
and take first 64 bits of it, rather than CRC64.

-- 
"If my Eternal Life Device does not give immortality, then the entire
Bible is a joke."                                        -- Alex Chiu
     http://www.alexchiu.com/affiliates/clickthru.cgi?id=shalunov


Subject: Re: Small hash checksum
Date: 2 Aug 2000 19:55:12 GMT
From: mdw@nsict.org (Mark Wooding)
Message-ID: <slrn8ogv50.2mf.mdw@tux.nsict.org>
References: <87k8dz1uyn.fsf@cain.internet2.edu>
Newsgroups: sci.crypt
Lines: 9

stanislav shalunov <shalunov@internet2.edu> wrote:

> But if the input is short (say 32 bytes on average), CRC would still
> be faster, correct?

Yes.  The blocking and padding of MD4 will dominate, and CRC will be a
much better choice.

-- [mdw]


Subject: CRC vs. HASH functions
Date: 05 Oct 2000 04:53:20 GMT
From: macckone@aol.comnjunk123 (Mack)
Message-ID: <20001005005320.01217.00000283@ng-fs1.aol.com>
References: <87k8dz1uyn.fsf@cain.internet2.edu>
Newsgroups: sci.crypt
Lines: 33

Having been working hard and not here for a while
the topic of CRC vs. HASH functions
came up in a thread.

1) CRC are faster than HASH functions of
comparable size.  That is a fact.  Many
hash functions use a CRC like layer at the
top to mix in data linearly. SHA-1 is no exception.
A table driven 256 bit hash function requires 4 32-bit word
lookups/byte, four 32-bit word XORs, a shift and an XOR
to add data.

A 16-bit lookup uses fewer lookups but much bigger
tables.

2) checksum is a special case of a CRC
consider the CRC polynomial 2^8+1.
Two common CRC's are the product of 2^1+1
and some other primitive.  This has certain
'nice' properties.

3) If you are hashing data use a CRC.
If you need security use a HASH function.

4) A HASH does not guarantee anything
A CRC guarantees certain changes will always
change the output.




Mack
Remove njunk123 from name to reply by e-mail


Subject: Re: CRC vs. HASH functions
Date: Thu, 5 Oct 2000 20:19:15 -0700
From: "Scott Fluhrer" <sfluhrer@ix.netcom.com>
Message-ID: <8rjhk7$qcg$1@nntp9.atl.mindspring.net>
References: <20001005005320.01217.00000283@ng-fs1.aol.com>
Newsgroups: sci.crypt
Lines: 29


Mack <macckone@aol.comnjunk123> wrote in message
news:20001005005320.01217.00000283@ng-fs1.aol.com...
> Having been working hard and not here for a while
> the topic of CRC vs. HASH functions
> came up in a thread.
>
> 1) CRC are faster than HASH functions of
> comparable size.  That is a fact.  Many
> hash functions use a CRC like layer at the
> top to mix in data linearly. SHA-1 is no exception.
> A table driven 256 bit hash function requires 4 32-bit word
> lookups/byte, four 32-bit word XORs, a shift and an XOR
> to add data.
>
> A 16-bit lookup uses fewer lookups but much bigger
> tables.

However, if you are willing to use a MAC rather than a HASH (which may be
appropriate depending on why you are summarizing the file in the first
place), there are MACs which can be even faster than CRC.  Examples of this
would include UMAC (http://www.cs.ucdavis.edu/~rogaway/umac/) and hash127
(http://cr.yp.to/hash127.html)

--
poncho





Subject: Re: CRC vs. HASH functions
Date: Thu, 12 Oct 2000 08:56:36 -0700
From: "Scott Fluhrer" <sfluhrer@ix.netcom.com>
Message-ID: <8s4o6n$oum$1@slb6.atl.mindspring.net>
References: <G2Bn4z.9Ku@bath.ac.uk>
   <8rjhk7$qcg$1@nntp9.atl.mindspring.net>
Newsgroups: sci.crypt
Lines: 68


Tim Tyler <tt@cryogen.com> wrote in message news:G2Bn4z.9Ku@bath.ac.uk...
> Scott Fluhrer <sfluhrer@ix.netcom.com> wrote:
> : Mack <macckone@aol.comnjunk123> wrote:
>
> :> 1) CRC are faster than HASH functions of
> :> comparable size.  That is a fact.  Many
> :> hash functions use a CRC like layer at the
> :> top to mix in data linearly. SHA-1 is no exception.
> :> A table driven 256 bit hash function requires 4 32-bit word
> :> lookups/byte, four 32-bit word XORs, a shift and an XOR
> :> to add data. [...]
>
> : However, if you are willing to use a MAC rather than a HASH (which may
be
> : appropriate depending on why you are summarizing the file in the first
> : place), there are MACs which can be even faster than CRC.  Examples of
this
> : would include UMAC (http://www.cs.ucdavis.edu/~rogaway/umac/) and
hash127
> : (http://cr.yp.to/hash127.html)
>
> Excuse my ignorance - but isn't a MAC a keyed hash?
>
> How can a MAC be faster than a hash - if you can build a hash out of a MAC
> by using it with a fixed key?
>
> I presume you are considering the security requirements for a MAC to be
> different from (and less than) those of a hash.
>
> So... where lies the difference?
>
> Why are security properties desirable in a hash not relevant in a MAC?

A MAC need not be a keyed hash.  A MAC with a fixed key need not be a secure
hash.  As you realized, it comes down to the security properties.

The fundamental security property of a HASH is that no one is able to find
two messages that have the same hash (as well as not being able to find
preimages).

The fundamental security property of a MAC is that someone without knowledge
of the secret key is unable to forge a new message/MAC pair, even if he have
seen many messages signed with the same key.

Note that one property that a MAC does not require is any collision
resistance (or preimage resistance) whatsoever against someone with the
keys -- anyone with the keys can trivially sign anything they want, and so
they must be trusted.  And so, since we can consider constructions that have
weaker security properties than keyed hash, it is plausible that these
alternative constructions are faster, and in fact, the two I referenced
above are two such constructions.

>
> Isn't it more the other way around?
>
> MACs should conceal the MAC key in use from attackers who can look at any
> number of hashed messages.  Such security requirements appear to be "in
> addition" to those required of a more conventional hash function.
Not in addition to, but instead of...


--
poncho






Subject: Re: CRC vs. HASH functions
Date: Fri, 06 Oct 2000 18:02:25 GMT
From: Bryan Olson <bryanolson@my-deja.com>
Message-ID: <8rl43g$kb9$1@nnrp1.deja.com>
References: <20001006001159.15659.00000336@ng-fd1.aol.com>
   <8rij4f$j0o$1@nnrp1.deja.com>
Newsgroups: sci.crypt
Lines: 37

Mack wrote:
> >Mack wrote:
> >
> >> 1) CRC are faster than HASH functions of
> >> comparable size.  That is a fact.  Many
> >> hash functions use a CRC like layer at the
> >> top to mix in data linearly. SHA-1 is no exception.
> >> A table driven 256 bit hash function requires 4 32-bit word
> >> lookups/byte, four 32-bit word XORs, a shift and an XOR
> >> to add data.
> >
> >A table driven hash function?  Did you mean a CRC?  In any
> >case, I'd like to see the algorithm to compute a 256 bit
> >result with the stated operations.

> Ack yes ... of course a table driven hash function is a possibility.
> Hmmm ... yes ... full of possibilities.
>
> Of  course I did mean table driven CRC function. Sorry for the error
> it has been a long long month.

I'd still like to see the algorithm.  It's not obvious to me
how to get a 256 bit CRC from four of each of the 32-bit
operations/byte.  You wrote that a 16-bit table (16-bit input
I assume) would require even fewer lookups, so I assume you
index in 8-bit units.  I think I see how to update a 128-bit
CRC with roughly those ops, but even then my shifting comes
out significantly worse when working with 32-bit words.


--Bryan
--
email: bolson at certicom dot com


Sent via Deja.com http://www.deja.com/
Before you buy.


Subject: Re: CRC vs. HASH functions
Date: 07 Oct 2000 00:57:08 GMT
From: macckone@aol.comnjunk123 (Mack)
Message-ID: <20001006205708.23758.00000531@ng-cp1.aol.com>
References: <8rl43g$kb9$1@nnrp1.deja.com>
Newsgroups: sci.crypt
Lines: 71

>Mack wrote:
>> >Mack wrote:
>> >
>> >> 1) CRC are faster than HASH functions of
>> >> comparable size.  That is a fact.  Many
>> >> hash functions use a CRC like layer at the
>> >> top to mix in data linearly. SHA-1 is no exception.
>> >> A table driven 256 bit hash function requires 4 32-bit word
>> >> lookups/byte, four 32-bit word XORs, a shift and an XOR
>> >> to add data.
>> >
>> >A table driven hash function?  Did you mean a CRC?  In any
>> >case, I'd like to see the algorithm to compute a 256 bit
>> >result with the stated operations.
>
>> Ack yes ... of course a table driven hash function is a possibility.
>> Hmmm ... yes ... full of possibilities.
>>
>> Of  course I did mean table driven CRC function. Sorry for the error
>> it has been a long long month.
>
>I'd still like to see the algorithm.  It's not obvious to me
>how to get a 256 bit CRC from four of each of the 32-bit
>operations/byte.  You wrote that a 16-bit table (16-bit input
>I assume) would require even fewer lookups, so I assume you
>index in 8-bit units.  I think I see how to update a 128-bit
>CRC with roughly those ops, but even then my shifting comes
>out significantly worse when working with 32-bit words.
>

Actually all those figures were supposed to be for a 128 bit CRC in the general
case.  That was another error on my part.

Sorry the shift operation is on a 128 bit word. Should have
specified that.  It breaks down to 8 32-bit shifts and 4 other 32-bit
operations.
The four others can be XOR or OR.  Since all of the affected bits are
zero. It doesn't matter which.

The XOR to add info actually only needs to be done once ever four operations.

I have a specific 256 bit CRC that I like that reduces the total number of
operations.  2^256+2^193+2^113+2^6+1.  In that case the bit-wise operation
only uses 4 bit to updata.  In fact in bitwise operation it is possible to do
away
with the shifts entirely.

There is a method for a 256 bit poly that uses 4 lookup tables and no shifts at
all.  You have 8 word lookups/byte  and 1 AND and 8 XORs.  It rotates the data
in place to emulate shifts. Data is added every four bytes. That is the most
efficient version I have.  The above poly reduces that figure of lookups and
XORs
to 3 each.

Anyone have a hash function that can beat it for speed?

>
>--Bryan
>--
>email: bolson at certicom dot com
>
>
>Sent via Deja.com http://www.deja.com/
>Before you buy.
>
>
>


Mack
Remove njunk123 from name to reply by e-mail


Subject: Re: CRC vs. HASH functions
Date: 07 Oct 2000 01:25:38 GMT
From: macckone@aol.comnjunk123 (Mack)
Message-ID: <20001006212538.23758.00000535@ng-cp1.aol.com>
References: <39DDE7D6.F931C0DA@mit.edu>
Newsgroups: sci.crypt
Lines: 82

>There is a body of literature that discusses relationships between coding
>theory
>and message authentication.
>
>For example, many MACs are constructed using 2-universal hash functions
>composed
>with some type of cryptographic hash.  The idea is to use the hash function
>to
>operate on the full message, and produce a small tag, and then apply the
>cryptographic construct to this smaller tag.  This approach (hopefully) saves
>time
>since the cryptographic work is done on a small tag.
>
>UMAC is one such approach.  A number of other universal hash function
>constructions have been proposed, and many of them are based on constructions
>from
>coding theory.
>
>Whether a CRC is "better" than a hash function is not well defined until you
>specify what you mean by "better."  If you're interested in message
>authentication
>or integrity, for example, a CRC by itself is not enough.  
>
>Zulfikar
>

No one is arguing that if security is desired then a CRC is useless.
That was point 3 of my original post.  However if there is no security need
then a CRC is 'better'.  Generally if an MAC of some type is being constructed
there is a security need and a CRC is not appropriate.

CRC's are generally best for:

1) compressing entropy free from outside influence.
2) Hashing data for table lookups or other non-security oriented
identification.
3) Random single bit error detection and burst error detection.

>Scott Fluhrer wrote:
>> 
>> Mack <macckone@aol.comnjunk123> wrote in message
>> news:20001005005320.01217.00000283@ng-fs1.aol.com...
>> > Having been working hard and not here for a while
>> > the topic of CRC vs. HASH functions
>> > came up in a thread.
>> >
>> > 1) CRC are faster than HASH functions of
>> > comparable size.  That is a fact.  Many
>> > hash functions use a CRC like layer at the
>> > top to mix in data linearly. SHA-1 is no exception.
>> > A table driven 256 bit hash function requires 4 32-bit word
>> > lookups/byte, four 32-bit word XORs, a shift and an XOR
>> > to add data.
>> >
>> > A 16-bit lookup uses fewer lookups but much bigger
>> > tables.
>> 
>> However, if you are willing to use a MAC rather than a HASH (which may be
>> appropriate depending on why you are summarizing the file in the first
>> place), there are MACs which can be even faster than CRC.  Examples of this
>> would include UMAC (http://www.cs.ucdavis.edu/~rogaway/umac/) and hash127
>> (http://cr.yp.to/hash127.html)
>> 
>> --
>> poncho
>
>-- 
>
>--Zully
>
>-------
>Zulfikar Ramzan  (AKA Zully)		
>Laboratory for Computer Science, MIT
>NE43-311, (617) 253-2345   
>http://theory.lcs.mit.edu/~zulfikar/homepage.html
>
>
>


Mack
Remove njunk123 from name to reply by e-mail


Subject: Re: CRC vs. HASH functions
Date: Mon, 09 Oct 2000 07:54:07 GMT
From: Bryan Olson <bryanolson@my-deja.com>
Message-ID: <8rrtiu$fac$1@nnrp1.deja.com>
References: <20001006212538.23758.00000535@ng-cp1.aol.com>
Newsgroups: sci.crypt
Lines: 28

Mack wrote:

> CRC's are generally best for:
>
> 1) compressing entropy free from outside influence.

That one is debatable.  Noise (the part we want in this
application) that appeared largely in multiples of the
CRC polynomial would be very surprising, but not
inconceivable.

> 2) Hashing data for table lookups or other non-security oriented
> identification.
> 3) Random single bit error detection and burst error detection.

I think any time I need a digest larger than 64-bits, I
would always use a cryptographic hash.  I cannot envision a
case where I would believe I hit a 2^-64 shot (or even a
2^-32) before I would suspect foul play.


--Bryan
--
email: bolson at certicom dot com


Sent via Deja.com http://www.deja.com/
Before you buy.


Subject: Re: CRC vs. HASH functions
Date: 13 Oct 2000 01:34:13 GMT
From: macckone@aol.comnjunk123 (Mack)
Message-ID: <20001012213413.04202.00001582@ng-bg1.aol.com>
References: <8rrtiu$fac$1@nnrp1.deja.com>
Newsgroups: sci.crypt
Lines: 45

>Mack wrote:
>
>> CRC's are generally best for:
>>
>> 1) compressing entropy free from outside influence.
>
>That one is debatable.  Noise (the part we want in this
>application) that appeared largely in multiples of the
>CRC polynomial would be very surprising, but not
>inconceivable.
>

No more inconceivable that noise would produce a collision
in a secure hash function.  Various CRC polynomials have
varying degrees of probability of "Noise" interacting badly.
For primitive polynomials that are dense the probability is
low.  Most MD style Hash functions use some sort of CRC type
polynomial on the front end so this is an issue there as well.

>> 2) Hashing data for table lookups or other non-security oriented
>> identification.
>> 3) Random single bit error detection and burst error detection.
>
>I think any time I need a digest larger than 64-bits, I
>would always use a cryptographic hash.  I cannot envision a
>case where I would believe I hit a 2^-64 shot (or even a
>2^-32) before I would suspect foul play.
>

Sending 2^32*128bits of data is no longer just hypothetical. 15 hours of
data on a 10Mbps ethernet will deliver that amount of data. With 32 bit
CRCs some errors may go unchecked. 128 bit CRCs are easier to implement
both in software and hardware than secure hash functions.  Another thread
suggested that a hash may be made as fast as a CRC but only at the expense
of silicon space.

>
>--Bryan
>--
>email: bolson at certicom dot com
>


Mack
Remove njunk123 from name to reply by e-mail


Subject: Re: CRC vs. HASH functions
Date: Fri, 13 Oct 2000 09:00:00 GMT
From: Bryan Olson <bryanolson@my-deja.com>
Message-ID: <8s6iug$620$1@nnrp1.deja.com>
References: <20001012213413.04202.00001582@ng-bg1.aol.com>
Newsgroups: sci.crypt
Lines: 73

In article <20001012213413.04202.00001582@ng-bg1.aol.com>,
  macckone@aol.comnjunk123 (Mack) wrote:
> >Mack wrote:
> >
> >> CRC's are generally best for:
> >>
> >> 1) compressing entropy free from outside influence.
> >
> >That one is debatable.  Noise (the part we want in this
> >application) that appeared largely in multiples of the
> >CRC polynomial would be very surprising, but not
> >inconceivable.
> >
>
> No more inconceivable that noise would produce a collision
> in a secure hash function.  Various CRC polynomials have
> varying degrees of probability of "Noise" interacting badly.
> For primitive polynomials that are dense the probability is
> low.

That's for noise fitting some specific model.  The
actual probability depends on the source, and we
don't usually know the true distribution of noise
on a given channel.

> Most MD style Hash functions use some sort of CRC type
> polynomial on the front end so this is an issue there as well.

Not true.  SHA-1 has the W buffer up front, but the effect
is nothing like a CRC.  In particular, the W-buffer
operations are reversible, so it alone never induces
collisions.  MD-2 actually tacks on a checksum (good thing
too), but it's not on the front end; it's appended in the
back.


> >> 2) Hashing data for table lookups or other non-security oriented
> >> identification.
> >> 3) Random single bit error detection and burst error detection.
> >
> >I think any time I need a digest larger than 64-bits, I
> >would always use a cryptographic hash.  I cannot envision a
> >case where I would believe I hit a 2^-64 shot (or even a
> >2^-32) before I would suspect foul play.
> >
>
> Sending 2^32*128bits of data is no longer just hypothetical.
> 15 hours of data on a 10Mbps ethernet will deliver that
> amount of data. With 32 bit CRCs some errors may go
> unchecked.

O.K, but I didn't disagree with going larger than 32 bits
(though 32-bits is in fact what Ethernet uses).


> 128 bit CRCs are easier to implement both in software
> and hardware than secure hash functions.  Another
> thread suggested that a hash may be made as fast as
> a CRC but only at the expense of silicon space.

CRCs are certainly the winner in hardware speed.  I don't
think the other poster was right that cryptographic hashes
can be as fast in hardware, at least not for the popular
ones such as SHA-1.


--Bryan
--
email: bolson at certicom dot com


Sent via Deja.com http://www.deja.com/
Before you buy.


Subject: Re: CRC vs. HASH functions
Date: 15 Oct 2000 19:58:18 GMT
From: macckone@aol.comnjunk123 (Mack)
Message-ID: <20001015155818.15827.00001978@ng-fd1.aol.com>
References: <8s6iug$620$1@nnrp1.deja.com>
Newsgroups: sci.crypt
Lines: 54

>In article <20001012213413.04202.00001582@ng-bg1.aol.com>,
>  macckone@aol.comnjunk123 (Mack) wrote:
>> >Mack wrote:
>> >
>> >> CRC's are generally best for:
>> >>
>> >> 1) compressing entropy free from outside influence.
>> >
>> >That one is debatable.  Noise (the part we want in this
>> >application) that appeared largely in multiples of the
>> >CRC polynomial would be very surprising, but not
>> >inconceivable.
>> >
>>
>> No more inconceivable that noise would produce a collision
>> in a secure hash function.  Various CRC polynomials have
>> varying degrees of probability of "Noise" interacting badly.
>> For primitive polynomials that are dense the probability is
>> low.
>
>That's for noise fitting some specific model.  The
>actual probability depends on the source, and we
>don't usually know the true distribution of noise
>on a given channel.
>
>> Most MD style Hash functions use some sort of CRC type
>> polynomial on the front end so this is an issue there as well.
>
>Not true.  SHA-1 has the W buffer up front, but the effect
>is nothing like a CRC.  In particular, the W-buffer
>operations are reversible, so it alone never induces
>collisions.  MD-2 actually tacks on a checksum (good thing
>too), but it's not on the front end; it's appended in the
>back.
>

The W buffer in SHA-1 acts very much like a CRC.
The use of the W buffer however hides that fact.  The
addition of the S (rotate) only changes the actual polynomial
not the fact that it behaves like a CRC.

In SHA-256 an ADD 32-bits is used which makes the operation
non-linear in GF(2)

>
[snip of the parts I agree with you on]
>
>--Bryan
>--
>email: bolson at certicom dot com
>

Mack
Remove njunk123 from name to reply by e-mail


Subject: Re: CRC vs. HASH functions
Date: Mon, 16 Oct 2000 20:04:23 GMT
From: Bryan Olson <bryanolson@my-deja.com>
Message-ID: <8sfn02$1c1$1@nnrp1.deja.com>
References: <20001015155818.15827.00001978@ng-fd1.aol.com>
Newsgroups: sci.crypt
Lines: 34

Mack wrote:
[Mack]
> >> Most MD style Hash functions use some sort of CRC type
> >> polynomial on the front end so this is an issue there as well.
> >
[Bryan]
> >Not true.  SHA-1 has the W buffer up front, but the effect
> >is nothing like a CRC.  In particular, the W-buffer
> >operations are reversible, so it alone never induces
> >collisions.  MD-2 actually tacks on a checksum (good thing
> >too), but it's not on the front end; it's appended in the
> >back.
> >
>
> The W buffer in SHA-1 acts very much like a CRC.

Whether you choose to look at it as being like a CRC is not
the issue.  It does not introduce collisions and does not
produce the same problem.

> The use of the W buffer however hides that fact.  The
> addition of the S (rotate) only changes the actual polynomial
> not the fact that it behaves like a CRC.

The W buffer is inseparable from the use of the W buffer.
Try it; add multiples of whatever you think is the polynomial,
and see what happens to the digest.


--Bryan


Sent via Deja.com http://www.deja.com/
Before you buy.


Subject: Re: CRC vs. HASH functions
Date: 17 Oct 2000 01:58:17 GMT
From: macckone@aol.comnjunk123 (Mack)
Message-ID: <20001016215817.19748.00000664@ng-co1.aol.com>
References: <8sfn02$1c1$1@nnrp1.deja.com>
Newsgroups: sci.crypt
Lines: 70

>Mack wrote:
>[Mack]
>> >> Most MD style Hash functions use some sort of CRC type
>> >> polynomial on the front end so this is an issue there as well.
>> >
>[Bryan]
>> >Not true.  SHA-1 has the W buffer up front, but the effect
>> >is nothing like a CRC.  In particular, the W-buffer
>> >operations are reversible, so it alone never induces
>> >collisions.  MD-2 actually tacks on a checksum (good thing
>> >too), but it's not on the front end; it's appended in the
>> >back.
>> >
>>
>> The W buffer in SHA-1 acts very much like a CRC.
>
>Whether you choose to look at it as being like a CRC is not
>the issue.  It does not introduce collisions and does not
>produce the same problem.
>

Back in my original post I stated that if security was an issue
then a Secure Hash Function is necessary.  CRCs provide
no security against an opponent.

I also stated that if security was not an issue then a CRC is
superior.  The very fact that SHA-1 uses a CRC like function as
its first step backs this argument.

Choosing a polynomial for a particular function is an entirely
different issue.  Generally high density is best. But low density
may be ok for some uses.

>> The use of the W buffer however hides that fact.  The
>> addition of the S (rotate) only changes the actual polynomial
>> not the fact that it behaves like a CRC.
>
>The W buffer is inseparable from the use of the W buffer.
>Try it; add multiples of whatever you think is the polynomial,
>and see what happens to the digest.
>

Unraveling a secure hash function involves more than one phase.
I think my doubts are justified though.  In SHA-256 the totally
linear operation is replaced with a mildly non-linear one using a
combination of XOR and ADD mod 2^32.  It is possible to have
a collision free highly non-linear function in that location but it is
slower.

But that is another thread entirely.  In a situation where security
is needed a CRC will not work.  Otherwise a CRC is superior.

It is faster (and smaller).  The faster issue may be negated somewhat
in hardware.  But hardware only makes the secure hash function
larger.  That is more silicon area.

A properly chosen polynomial will meet the same requirements
as a secure hash for non-security uses and do so faster.

>
>--Bryan
>
>
>Sent via Deja.com http://www.deja.com/
>Before you buy.
>
>

Mack
Remove njunk123 from name to reply by e-mail


Subject: Re: CRC vs. HASH functions
Date: Sat, 21 Oct 2000 13:01:09 GMT
From: Tim Tyler <tt@cryogen.com>
Message-ID: <G2s6tx.Fyu@bath.ac.uk>
References: <8s6iug$620$1@nnrp1.deja.com>
Newsgroups: sci.crypt
Lines: 40

Bryan Olson <bryanolson@my-deja.com> wrote:
:   macckone@aol.comnjunk123 (Mack) wrote:
:> >Mack wrote:

:> >> CRC's are generally best for: [entropy distillation]

:> 128 bit CRCs are easier to implement both in software
:> and hardware than secure hash functions.  Another
:> thread suggested that a hash may be made as fast as
:> a CRC but only at the expense of silicon space.

: CRCs are certainly the winner in hardware speed.  I don't
: think the other poster was right that cryptographic hashes
: can be as fast in hardware, at least not for the popular
: ones such as SHA-1.

I believe the "other poster" was me.

To reiterate, my claim is not that secure cryptographic hash functions can
be made as fast in hardware as CRCs.

My claim was that *throughput* through most secure hash functions can be
made extremely rapid in hardware, by laying out the rounds in physical 
rows and allowing more data to pass into the inputs of the hash before
the current data has been completely processed.

This will make a hardware implementation of a hash very fast - but it
*still* may not match that of a CRC because:

a) the round function may be complex and slow to compute;
b) there are more components, and thus greater problems getting
   synchronous operation.  This is likely to negatively impact performance.

My argument is that speed issues with hashes could be reduced to be a
relatively minor concern in the context of entropy distillation
applications - *if* silicon area is available.  Choosing a hash with a
large number of simple and rapid rounds would help still further.
-- 
__________                  http://alife.co.uk/  http://mandala.co.uk/
 |im |yler  tt@cryogen.com  http://hex.org.uk/   http://atoms.org.uk/


Subject: Re: CRC vs. HASH functions
Date: Sun, 22 Oct 2000 22:12:57 GMT
From: Bryan Olson <bryanolson@my-deja.com>
Message-ID: <8svop6$m11$1@nnrp1.deja.com>
References: <G2s6tx.Fyu@bath.ac.uk>
Newsgroups: sci.crypt
Lines: 25

Tim Tyler wrote:

> To reiterate, my claim is not that secure cryptographic
> hash functions can be made as fast in hardware as CRCs.


The specific claim relant here was

| Consequently - while the area used is much increased when
| compared to a CRC - there's not anything like so much
| difference in throughput speed as a software engineer might
| expect.

This is nonsense.  No SHA-1 hardware that anyone makes (or
is likely to make) can match a hardware CRC.

Counter mode can always be computed in parallel.  That has
nothing to do with the properties of the hash.


--Bryan


Sent via Deja.com http://www.deja.com/
Before you buy.


Subject: Re: CRC vs. HASH functions
Date: Mon, 23 Oct 2000 01:19:23 GMT
From: Tim Tyler <tt@cryogen.com>
Message-ID: <G2uzoB.HzH@bath.ac.uk>
References: <8svop6$m11$1@nnrp1.deja.com>
Newsgroups: sci.crypt
Lines: 34

Bryan Olson <bryanolson@my-deja.com> wrote:
: Tim Tyler wrote:

:> To reiterate, my claim is not that secure cryptographic
:> hash functions can be made as fast in hardware as CRCs.

: The specific claim relant here was

: | Consequently - while the area used is much increased when
: | compared to a CRC - there's not anything like so much
: | difference in throughput speed as a software engineer might
: | expect.

: This is nonsense.  No SHA-1 hardware that anyone makes (or
: is likely to make) can match a hardware CRC.

My claim appears to be pretty vaguely worded - but certainly there is
no incompatibility with your following statement.

I granted that there were other issues that mean that CRCs are likely to
remain faster than many hashes in the post you are responding to.

: Counter mode can always be computed in parallel.  That has
: nothing to do with the properties of the hash.

It's "helpful" that the computation can be done in parallel, without
laying out N separate implementations of the hash.

But you're right to point out that basically my statements boil down to
the fact that some hash applications can be parallelised, so any
deficiencies in raw speed can be traded for area.
-- 
__________  Lotus Artificial Life  http://alife.co.uk/  tt@cryogen.com
 |im |yler  The Mandala Centre   http://mandala.co.uk/  http://av.com/


Subject: Re: CRC vs. HASH functions
Date: Wed, 11 Oct 2000 09:11:48 GMT
From: Tim Tyler <tt@cryogen.com>
Message-ID: <G29DJo.5E6@bath.ac.uk>
References: <20001005005320.01217.00000283@ng-fs1.aol.com>
Newsgroups: sci.crypt
Lines: 24

Mack <macckone@aol.comnjunk123> wrote:

: Having been working hard and not here for a while
: the topic of CRC vs. HASH functions came up in a thread.

: 1) CRC are faster than HASH functions of
: comparable size.  That is a fact.  Many
: hash functions use a CRC like layer at the
: top to mix in data linearly. SHA-1 is no exception.

While hash functions take longer to evaluate in software, they often have
a layered structure that allows hardware implementations to accept new
inputs before the last inputs have been completely processed, allowing
concurrent processing to take place.

Consequently - while the area used is much increased when compared to a
CRC - there's not anything like so much difference in throughput speed as
a software engineer might expect.

One field where this seems relevant is when condensing entropy from random
sources.
-- 
__________                  http://alife.co.uk/  http://mandala.co.uk/
 |im |yler  tt@cryogen.com  http://hex.org.uk/   http://atoms.org.uk/


Subject: Re: CRC vs. HASH functions
Date: Fri, 13 Oct 2000 09:23:34 GMT
From: Bryan Olson <bryanolson@my-deja.com>
Message-ID: <8s6kai$750$1@nnrp1.deja.com>
References: <G29DJo.5E6@bath.ac.uk>
Newsgroups: sci.crypt
Lines: 24

Tim Tyler wrote:

> While hash functions take longer to evaluate in software,
> they often have a layered structure that allows hardware
> implementations to accept new inputs before the last
> inputs have been completely processed, allowing
> concurrent processing to take place.
>
> Consequently - while the area used is much increased when
> compared to a CRC - there's not anything like so much
> difference in throughput speed as a software engineer
> might expect.

What hashes are you talking about? In cryptographic hashes
such as SHA-1, there exist opportunities for parallelism but
the number of stages that must be sequential is a hardware
nightmare compared to a CRC.


--Bryan


Sent via Deja.com http://www.deja.com/
Before you buy.


Subject: Re: CRC vs. HASH functions
Date: Thu, 19 Oct 2000 09:03:38 GMT
From: Tim Tyler <tt@cryogen.com>
Message-ID: <G2o6I2.Gq3@bath.ac.uk>
References: <8s6kai$750$1@nnrp1.deja.com>
Newsgroups: sci.crypt
Lines: 33

Bryan Olson <bryanolson@my-deja.com> wrote:
: Tim Tyler wrote:

:> While hash functions take longer to evaluate in software,
:> they often have a layered structure that allows hardware
:> implementations to accept new inputs before the last
:> inputs have been completely processed, allowing
:> concurrent processing to take place. [snip CRC comparison]

: What hashes are you talking about? In cryptographic hashes
: such as SHA-1 [...]

This type.

: [...] there exist opportunities for parallelism but
: the number of stages that must be sequential is a hardware
: nightmare compared to a CRC.

While subsequent rounds require processing sequentially, this affects
the time taken to produce an output from an input.  AFAIK, there is
not necessarily any effect on the "throughput" time - since the rounds
can be laid out in rows in hardware, and new inputs can be fed into the
construction before previous outputs have been processed.

This might not greatly affect (say) the time taken to produce the hash
of a file - or where blocks need to be sequentially chained together.

However, it could make a big difference for instances where the processing
of the hash is used to generate confusion sequences - e.g. in a RNG based
on a counter-driven hash.
-- 
__________                  http://alife.co.uk/  http://mandala.co.uk/
 |im |yler  tt@cryogen.com  http://hex.org.uk/   http://atoms.org.uk/
Terry Ritter, his current address, and his top page.
Last updated: 2001-06-05
Hashing and CRC

Terry Ritter

A Ciphers By Ritter Page

Contents