Proposal: Add the unordered-containers package and the hashable package to the Haskell Platform
Isaac Dupree
ml at isaac.cedarswampstudios.org
Thu Mar 21 22:26:26 CET 2013
Regarding hashWithSalt determinism:
hashable 1.1:
"The general contract of hash is: * This integer need not remain
consistent from one execution of an application to another execution of
the same application. [...] The contract for hashWithSalt is the same as
for hash, with the additional requirement that any instance that defines
hashWithSalt must make use of the salt in its implementation." [1]
hashable 1.2:
"The general contract of hashWithSalt is: * If a value is hashed using
the same salt during distinct runs of an application, the result must
remain the same. (This is necessary to make it possible to store hashes
on persistent media.) [...]"
Which contract do we want?
The size of Int varies across platforms, so it is difficult to make an
instance of these Hashable classes that returns the same hash on e.g.
x86 and x86_64. [3]
(I imagine it can be done by making sure the high bits of the returned
Int are always 0 or sign-extended... depending on how
unordered-containers interprets hashes [if unordered-containers is the
relevant hash consumer]. This seems fragile.)
Thankfully, if Data.Map is only twice as slow as Data.HashMap, I
wouldn't feel too bad about using Data.Map for its determinism and good
worst-case asymptotics.
The type of hashes could be Word32 or Word64, or if it's better for
performance+security and we don't care about determinism, Word.
Thoughts?
-Isaac
[1]
http://hackage.haskell.org/packages/archive/hashable/1.1.2.5/doc/html/Data-Hashable.html
[2]
http://hackage.haskell.org/packages/archive/hashable/1.2.0.4/doc/html/Data-Hashable.html
[3] C++11's std::hash interface has the same issue, except that at least
it returns an *unsigned* inconsistent-width type, size_t. I know this
because I'm making a cross-platform-deterministic C++11 program... I
think that if I use hash-tables there, I'll probably have to make a
local copy of a hash-table implementation. Or audit my code to ensure
that it never enumerates a hash-table in a way that is affected by
ordering. If it were Haskell I could ensure
enumeration-order-invariance using a class CommutativeMonoid or
CommutativeSemigroup... ;-)
More information about the Libraries
mailing list