Data.HashTable.hashInt seems somewhat sub-optimal

Sun Aug 26 13:42:48 EDT 2007

Hello,

On Monday 20 August 2007 13:15, Ian Lynagh wrote:
> ...
> I'm also suspicious of this, though:
> 
>     -- | A sample hash function for Strings.  We keep multiplying by the
>     -- golden ratio and adding.
>     --
>     -- Note that this has not been extensively tested for reasonability,
>     -- but Knuth argues that repeated multiplication by the golden ratio
>     -- will minimize gaps in the hash space.
>     hashString :: String -> Int32
>     hashString = foldl' f 0
>       where f m c = fromIntegral (ord c + 1) * golden + mulHi m golden
> 
> should this be
> 
>     where f m c = (fromIntegral (ord c + 1) + m) * golden
> 
> ? Does Knuth (TAOCP?) say?

In the 2nd edition of Knuth's The Art of Computer Programming, Vol 3, Sorting 
and Searching there is a discussion of hash functions on pp. 514-520. One of 
the techniques suggested for hashing a one-word (i.e. essentially fixed-size) 
key is the following multiplicative scheme:

  h(K) = floor ( M*(((A/w)*K)) mod 1) )

where w is the word-size (say, 2^32), M is the desired limit of the hash 
function (for efficiency, probably a suitable power of 2) and, finally, A is 
some integer constant. What happens here is that we consider the (word) K as 
a fraction with the binary point at the left end of the word rather than at 
the right, thus getting a fraction with a value between 0 and 1. This value 
we then multiply by A and cut off the integer part, once again getting a 
fractional value between 0 and 1. And finally, we multiply by M and cut away 
the fractional part to get an integer value between 0 and M-1. And, sure, 
Knuth suggests various variants of selecting the multiplier A related to the 
golden ratio (sqrt(5)-1)/2 = 0.6180... to gain suitable spreading of hashes 
for keys in arithmetic progressions. (K, K+d, K+2d, ...).

But what we are dealing with in the hashString function is what Knuth would 
call a multiword or variable-length key. Such cases, Knuth suggests, "can be 
handled by multiple-precision extensions of [e.g. the multiplicative scheme] 
above, but it is generally adequate to speed things up by combining the 
individual words together into a single word, then doing a single 
multiplication ... as above."

Neither of the above definitions of f implement a multiple-precision extension 
of the multiplicative hashing scheme that involves the golden ration. And 
none of the methods suggested by Knuth for combining multiple words into 
single words or otherwise compute hashes for multiword keys involve the 
golden ration.

So I cannot find obvious traces of Knuth having anything at all to say about 
either of the f's.

> ...

Best regards
Thorkil