RFC: Adding a Hashable type class and HashMap/HashSet data types to HP

Yitzchak Gale gale at sefer.org
Thu Nov 25 12:59:36 EST 2010


I wrote:
>>>> More interesting are Maybe, Either, lists, tuples, Data.Map, Data.Set,
>>>> etc. These are far from trivial; I did some work on it for Python. If you
>>>> think you can just do something fairly simple and reasonable and hope
>>>> that it will have the right statistical properties, I promise you that it won't
>>>> work.

Thomas Schilling wrote:
> Right, murmurhash2 is a very good and well-tested hash function.

It's good for strings. So we can use it for String and Data.Text.

But how about [Data.Map Data.Text (String, [(Int, Bool)])]
or whatever. Do you have a hash function for that?
When you have containers that can be used to build structures
of arbitrary shape, writing a hash function that works in
general is a hard problem.

Were you thinking that you could serialize those things and
then just use murmurhash on them? Try it, but my guess is
it won't work well. Murmurhash is designed for strings, not for
those things. It's not a cryptographic hash; it's designed to
have good statistical properties for a certain very specific types
of input. If you abuse it, you'll likely lose those good properties.
The parameters for murmerhash were tweaked using keys that
were English words, and very short strings of random bytes.
There is no evidence it will work well with other kinds of keys.

I'd be very happy to be proven wrong though.

Regards,
Yitz


More information about the Libraries mailing list