[Haskell] Fingerprints and hashing

Carl Witty cwitty at newtonlabs.com
Thu Oct 11 12:58:27 EDT 2007


On Thu, 2007-10-11 at 12:28 +0100, Simon Peyton-Jones wrote:
> Interesting! The associativity property is the kind of thing I was after.  Except that I don't really care if FP(as ++ bs) = FP(as) `plusFP` FP(bs).  I only care that the latter is robust in the sense of having low probabilty of collision.  So the Rabin scheme does more than I really need (which is probably fine).
> 
> On page 1 he draws a distinction between fingerprinting and hashing, which relates to my last wish.  Does one need a different technique for the two, or is it enough to reduce the number of bits in the Rabin scheme, and thereby get a hashing scheme?
> 
> Strangely the paper gives no comparison with other fingerprinting schemes.
> 
> Simon
> | From: haskell-bounces at haskell.org [mailto:haskell-bounces at haskell.org] On Behalf Of Lauri Alanko
> |
> | If compositionality is important, at least Rabin's fingerprints are
> | worth considering: http://citeseer.ist.psu.edu/broder93some.html

Note that Rabin's fingerprint algorithm is the same as CRC
(http://en.wikipedia.org/wiki/Cyclic_redundancy_check).

I'm not aware of people using CRCs for hashing inside a program, but
they should work fine and be fairly fast.

I can think of two caveats.  

1) With CRC and a fixed polynomial, it is very easy for an "adversary"
to pick multiple inputs that give the same fingerprint/hash code.
(Rabin avoids this by picking a random polynomial after the adversary
has selected the inputs.)  Depending on the application, this may be
very important, or it may not matter at all.

2) The straightforward approach for "instance Fingerprintable
Fingerprint" works very poorly for CRCs.  Considering balanced binary
trees with four leaves, and the obvious compositional hash function,
then for all a,b,c,d we would have
fingerprint ((a,b),(c,d)) == fingerprint ((a,c),(b,d)),
and
fingerprint ((a,b),(b,d)) == fingerprint ((a,c),(c,d)).

To avoid this, the fingerprint would have to include not only the
remainder, but the number of bits involved in creating the fingerprint.
Then fingerprinting a fingerprint would require time logarithmic in this
number of bits.

Carl Witty




More information about the Haskell mailing list