Data.HashTable.hashInt seems somewhat sub-optimal
Ian Lynagh
igloo at earth.li
Mon Aug 20 07:15:15 EDT 2007
On Thu, Aug 16, 2007 at 05:06:53PM +0100, Simon.Frankau at barclayscapital.com wrote:
> I tried submitting this bug through the tracker, but it seemed to give
> me permissions errors - probably a firewall issue here.
Due to spammers you need to log in to the bug tracker (user guest,
password guest), so that's another possibility.
> Prelude> Data.HashTable.hashInt 0
> 0
> Prelude> Data.HashTable.hashInt 1
> -1
> Prelude> Data.HashTable.hashInt 2
> -1
> Prelude> Data.HashTable.hashInt 3
> -2
> Prelude> Data.HashTable.hashInt 4
> -2
> Prelude> Data.HashTable.hashInt 5
> -2
> Prelude> Data.HashTable.hashInt 6
> -3
> Prelude> Data.HashTable.hashInt 7
> -3
> Prelude> Data.HashTable.hashInt 8
> -4
> Prelude> Data.HashTable.hashInt 9
> -4
> Prelude> Data.HashTable.hashInt 10
> -4
> Prelude> Data.HashTable.hashInt 200
> -77
> Prelude> Data.HashTable.hashInt 201
> -77
> Prelude> Data.HashTable.hashInt 202
> -78
>
> I prefer to use hashing to decrease the likelihood of collisions, not
> increase them. ;)
>
> I think the original algorithm was quite possibly supposed to use the
> bottom 32 bits of the result,
Based on
http://www.brpreiss.com/books/opus4/html/page213.html
http://www.brpreiss.com/books/opus4/html/page214.html
it looks like you're right.
I'll change it from
hashInt :: Int -> Int32
hashInt x = mulHi (fromIntegral x) golden
to
hashInt :: Int -> Int32
hashInt x = fromIntegral x * golden
which gives better results:
Prelude> Data.HashTable.hashInt 0
0
Prelude> Data.HashTable.hashInt 1
-1640531527
Prelude> Data.HashTable.hashInt 2
1013904242
Prelude> Data.HashTable.hashInt 3
-626627285
Prelude> Data.HashTable.hashInt 4
2027808484
Prelude> Data.HashTable.hashInt 5
387276957
Prelude> Data.HashTable.hashInt 6
-1253254570
I'm also suspicious of this, though:
-- | A sample hash function for Strings. We keep multiplying by the
-- golden ratio and adding.
--
-- Note that this has not been extensively tested for reasonability,
-- but Knuth argues that repeated multiplication by the golden ratio
-- will minimize gaps in the hash space.
hashString :: String -> Int32
hashString = foldl' f 0
where f m c = fromIntegral (ord c + 1) * golden + mulHi m golden
should this be
where f m c = (fromIntegral (ord c + 1) + m) * golden
? Does Knuth (TAOCP?) say?
Thanks
Ian
More information about the Libraries
mailing list