Data.HashTable.hashInt seems somewhat sub-optimal
Thorkil Naur
naur at post11.tele.dk
Sun Aug 26 13:42:48 EDT 2007
Hello,
On Monday 20 August 2007 13:15, Ian Lynagh wrote:
> ...
> I'm also suspicious of this, though:
>
> -- | A sample hash function for Strings. We keep multiplying by the
> -- golden ratio and adding.
> --
> -- Note that this has not been extensively tested for reasonability,
> -- but Knuth argues that repeated multiplication by the golden ratio
> -- will minimize gaps in the hash space.
> hashString :: String -> Int32
> hashString = foldl' f 0
> where f m c = fromIntegral (ord c + 1) * golden + mulHi m golden
>
> should this be
>
> where f m c = (fromIntegral (ord c + 1) + m) * golden
>
> ? Does Knuth (TAOCP?) say?
In the 2nd edition of Knuth's The Art of Computer Programming, Vol 3, Sorting
and Searching there is a discussion of hash functions on pp. 514-520. One of
the techniques suggested for hashing a one-word (i.e. essentially fixed-size)
key is the following multiplicative scheme:
h(K) = floor ( M*(((A/w)*K)) mod 1) )
where w is the word-size (say, 2^32), M is the desired limit of the hash
function (for efficiency, probably a suitable power of 2) and, finally, A is
some integer constant. What happens here is that we consider the (word) K as
a fraction with the binary point at the left end of the word rather than at
the right, thus getting a fraction with a value between 0 and 1. This value
we then multiply by A and cut off the integer part, once again getting a
fractional value between 0 and 1. And finally, we multiply by M and cut away
the fractional part to get an integer value between 0 and M-1. And, sure,
Knuth suggests various variants of selecting the multiplier A related to the
golden ratio (sqrt(5)-1)/2 = 0.6180... to gain suitable spreading of hashes
for keys in arithmetic progressions. (K, K+d, K+2d, ...).
But what we are dealing with in the hashString function is what Knuth would
call a multiword or variable-length key. Such cases, Knuth suggests, "can be
handled by multiple-precision extensions of [e.g. the multiplicative scheme]
above, but it is generally adequate to speed things up by combining the
individual words together into a single word, then doing a single
multiplication ... as above."
Neither of the above definitions of f implement a multiple-precision extension
of the multiplicative hashing scheme that involves the golden ration. And
none of the methods suggested by Knuth for combining multiple words into
single words or otherwise compute hashes for multiword keys involve the
golden ration.
So I cannot find obvious traces of Knuth having anything at all to say about
either of the f's.
> ...
Best regards
Thorkil
More information about the Libraries
mailing list