[Haskell-cafe] Very imperfect hash function

Tue Feb 2 03:50:07 EST 2010

On 2 Feb 2010, at 03:05, Richard O'Keefe wrote:

>> A simple hash-function for strings is to simply exclusive-or the  
>> bytes and then reduce modulo a prime number,
>
> Simply exclusive-oring the bytes will give you at most 256 distinct
> results.  (For an ASCII source, 128 distinct results.)  After that,
> there hardly seems to be any point in reduction modulo a prime.

Right - I just gave an example of how simple hash functions may be  
created. The original question deals with Int32s, not strings.

As already mentioned before, there are more advance libraries here
   http://en.wikipedia.org/wiki/Perfect_hash_function
but they are not written in Haskell. So you have to rewrite them or  
use the FFI.

> This approach can't tell a CAT from an ACT or a DOG from a GOD, which
> is another strike against it.  (It also can't tell a TITTLE from a  
> TILE,
> or a BOTTLE from a BOLE, for obvious reasons.)

The hash function just tries to produce random lookup values, which  
are used to flatten the average depth of the lookup table at the  
entries of the array. So there is a tradeoff between a fast hash  
function, and one that does a good job.

   Hans