Data.HashTable.hashInt seems somewhat sub-optimal

Ian Lynagh igloo at earth.li
Wed Aug 29 18:38:55 EDT 2007


Hi,

On Tue, Aug 28, 2007 at 11:41:22AM -0400, Jan-Willem Maessen wrote:
> 
> golden :: Int32
> golden = 1013904242 -- = round ((sqrt 5 - 1) * 2^32) :: Int32
> -- was -1640531527 = round ((sqrt 5 - 1) * 2^31) :: Int32
> -- but that has bad mulHi properties (even adding 2^32 to get its  
> inverse)
> -- Whereas the above works well and contains no hash duplications for
> -- [-32767..65536]
> 
> hashInt32 :: Int32 -> Int32
> hashInt32 x = mulHi x golden + x

This gives

> map hashInt [0..16]
[0,1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19]

> --  (length $ group $ sort $ map hashInt [-32767..65536]) == 65536 +  
> 32768

This test also passes for the

    golden :: Int32
    golden = -1640531527

    hashInt :: Int -> Int32
    hashInt x = fromIntegral x * golden

implementation, which has a very pretty distribution; graph at the
bottom of
    http://www.brpreiss.com/books/opus4/html/page214.html

> hashString :: String -> Int32
> hashString = foldl' f golden
>   where f m c = fromIntegral (fromEnum c) * magic + hashInt32 m
>         magic = 0xdeadbeef

Why use magic rather than golden? This makes sense to me:

    hashString :: String -> Int32
    hashString = foldl' f golden
      where f m c = (fromIntegral (ord c) `xor` m) * golden 

Is anything obviously wrong with it?


Thanks
Ian



More information about the Libraries mailing list