Data.HashTable.hashInt seems somewhat sub-optimal

Ian Lynagh igloo at earth.li
Mon Aug 20 07:15:15 EDT 2007


On Thu, Aug 16, 2007 at 05:06:53PM +0100, Simon.Frankau at barclayscapital.com wrote:
> I tried submitting this bug through the tracker, but it seemed to give
> me permissions errors - probably a firewall issue here.

Due to spammers you need to log in to the bug tracker (user guest,
password guest), so that's another possibility.

> Prelude> Data.HashTable.hashInt 0
> 0
> Prelude> Data.HashTable.hashInt 1
> -1
> Prelude> Data.HashTable.hashInt 2
> -1
> Prelude> Data.HashTable.hashInt 3
> -2
> Prelude> Data.HashTable.hashInt 4
> -2
> Prelude> Data.HashTable.hashInt 5
> -2
> Prelude> Data.HashTable.hashInt 6
> -3
> Prelude> Data.HashTable.hashInt 7
> -3
> Prelude> Data.HashTable.hashInt 8
> -4
> Prelude> Data.HashTable.hashInt 9
> -4
> Prelude> Data.HashTable.hashInt 10
> -4
> Prelude> Data.HashTable.hashInt 200
> -77
> Prelude> Data.HashTable.hashInt 201
> -77
> Prelude> Data.HashTable.hashInt 202
> -78
> 
> I prefer to use hashing to decrease the likelihood of collisions, not
> increase them. ;)
> 
> I think the original algorithm was quite possibly supposed to use the
> bottom 32 bits of the result,

Based on
    http://www.brpreiss.com/books/opus4/html/page213.html
    http://www.brpreiss.com/books/opus4/html/page214.html
it looks like you're right.

I'll change it from

    hashInt :: Int -> Int32
    hashInt x = mulHi (fromIntegral x) golden

to

    hashInt :: Int -> Int32
    hashInt x = fromIntegral x * golden

which gives better results:

Prelude> Data.HashTable.hashInt 0
0
Prelude> Data.HashTable.hashInt 1
-1640531527
Prelude> Data.HashTable.hashInt 2
1013904242
Prelude> Data.HashTable.hashInt 3
-626627285
Prelude> Data.HashTable.hashInt 4
2027808484
Prelude> Data.HashTable.hashInt 5
387276957
Prelude> Data.HashTable.hashInt 6
-1253254570


I'm also suspicious of this, though:

    -- | A sample hash function for Strings.  We keep multiplying by the
    -- golden ratio and adding.
    --
    -- Note that this has not been extensively tested for reasonability,
    -- but Knuth argues that repeated multiplication by the golden ratio
    -- will minimize gaps in the hash space.
    hashString :: String -> Int32
    hashString = foldl' f 0
      where f m c = fromIntegral (ord c + 1) * golden + mulHi m golden

should this be

    where f m c = (fromIntegral (ord c + 1) + m) * golden

? Does Knuth (TAOCP?) say?


Thanks
Ian



More information about the Libraries mailing list