Data.HashTable.hashInt seems somewhat sub-optimal
Ian Lynagh
igloo at earth.li
Wed Aug 29 18:38:55 EDT 2007
Hi,
On Tue, Aug 28, 2007 at 11:41:22AM -0400, Jan-Willem Maessen wrote:
>
> golden :: Int32
> golden = 1013904242 -- = round ((sqrt 5 - 1) * 2^32) :: Int32
> -- was -1640531527 = round ((sqrt 5 - 1) * 2^31) :: Int32
> -- but that has bad mulHi properties (even adding 2^32 to get its
> inverse)
> -- Whereas the above works well and contains no hash duplications for
> -- [-32767..65536]
>
> hashInt32 :: Int32 -> Int32
> hashInt32 x = mulHi x golden + x
This gives
> map hashInt [0..16]
[0,1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19]
> -- (length $ group $ sort $ map hashInt [-32767..65536]) == 65536 +
> 32768
This test also passes for the
golden :: Int32
golden = -1640531527
hashInt :: Int -> Int32
hashInt x = fromIntegral x * golden
implementation, which has a very pretty distribution; graph at the
bottom of
http://www.brpreiss.com/books/opus4/html/page214.html
> hashString :: String -> Int32
> hashString = foldl' f golden
> where f m c = fromIntegral (fromEnum c) * magic + hashInt32 m
> magic = 0xdeadbeef
Why use magic rather than golden? This makes sense to me:
hashString :: String -> Int32
hashString = foldl' f golden
where f m c = (fromIntegral (ord c) `xor` m) * golden
Is anything obviously wrong with it?
Thanks
Ian
More information about the Libraries
mailing list