[Haskell-cafe] Re: Very imperfect hash function

Richard O'Keefe ok at cs.otago.ac.nz
Mon Feb 1 22:52:50 EST 2010


>> On Thu, 2010-01-28 at 14:07 -0500, Steve Schafer wrote:
>>> I'm looking for some algorithmic suggestions:
>>>
>>> I have a set of several hundred key/value pairs. The keys are 32-bit
>>> integers, and are all distinct. The values are also integers, but  
>>> the
>>> number of values is small (only six in my current problem). So,
>>> obviously, several keys map to the same value.

Instead of mapping keys to values, map keys to sets of values,
where each set of values is represented by a small bit string.
In your present case, one byte would be enough.
>>>
>>>
>>> For some subsets of keys, examining only a small portion of the  
>>> key's
>>> bits is enough to determine the associated value. For example,  
>>> there may
>>> be 250 keys that all have the same most-significant byte, and all  
>>> 250
>>> map to the same value. There are also keys at the other extreme,  
>>> where
>>> two keys that differ in only one bit position map to different  
>>> values.

On today's machines, "several hundred" pairs counts as trivial.
Start by using a Data.IntMap of bytes and look for something else
only if that doesn't pay off.  This already takes advantage of the
bit-string nature of your keys, by the way.



More information about the Haskell-Cafe mailing list