[Haskell-cafe] Re: Very imperfect hash function
Richard O'Keefe
ok at cs.otago.ac.nz
Mon Feb 1 22:52:50 EST 2010
>> On Thu, 2010-01-28 at 14:07 -0500, Steve Schafer wrote:
>>> I'm looking for some algorithmic suggestions:
>>>
>>> I have a set of several hundred key/value pairs. The keys are 32-bit
>>> integers, and are all distinct. The values are also integers, but
>>> the
>>> number of values is small (only six in my current problem). So,
>>> obviously, several keys map to the same value.
Instead of mapping keys to values, map keys to sets of values,
where each set of values is represented by a small bit string.
In your present case, one byte would be enough.
>>>
>>>
>>> For some subsets of keys, examining only a small portion of the
>>> key's
>>> bits is enough to determine the associated value. For example,
>>> there may
>>> be 250 keys that all have the same most-significant byte, and all
>>> 250
>>> map to the same value. There are also keys at the other extreme,
>>> where
>>> two keys that differ in only one bit position map to different
>>> values.
On today's machines, "several hundred" pairs counts as trivial.
Start by using a Data.IntMap of bytes and look for something else
only if that doesn't pay off. This already takes advantage of the
bit-string nature of your keys, by the way.
More information about the Haskell-Cafe
mailing list