[Haskell-cafe] String Hashing
Jan-Willem Maessen
jmaessen at alum.mit.edu
Mon Jun 18 17:17:25 EDT 2007
On Jun 17, 2007, at 9:55 PM, Thomas Conway wrote:
> Hi All,
>
> I'm trying to figure out how to maximum performance out of one of my
> inner loops which involves string hashing.
...
mix :: Triple -> Triple
This looks like a version of the Bob Jenkins hash function from
burtleburtle.net. I implemented the 32-bit version of this as follows:
mix :: Int32 -> Int32 -> Int32 -> (Int32 -> Int32 -> Int32 -> a) -> a
mix a0 b0 c0 k0 =
let mixR k a b c = (a-b-c) `xor` (c `shiftR` k)
mixL k b c a = (b-c-a) `xor` (a `shiftL` k)
mix3 k1 k2 k3 k a b c =
let a' = mixR k1 a b c
b' = mixL k2 b c a'
c' = mixR k3 c a' b'
in k a' b' c'
in (mix3 13 8 13 $ mix3 12 16 5 $ mix3 3 10 15 $ k0) a0 b0 c0
I mention this because your code writes the whole thing out
longhand---which might be faster, or might not, but certainly misses
the highest-level structural patterns in the original. Your use of a
data type to represent triples is probably better nowadays than my
rather quirky use of CPS (in other words, this could have been a
function Triple -> Triple instead of the rather odd type you see above).
That said, I assume you instrumented your code and determined that
hash collisions are actually a bottleneck for you, and that a hash
table is the right structure to begin with? I fell back on much-
simpler multiplicative hashing schemes for Data.HashTable. A
multiply is much faster than vast amounts of bit-fiddling---but of
course its collision behavior isn't nearly as good and this can be a
problem with large data sets. And note that the multiplicative
hashing currently used in Data.HashTable doesn't require prime table
sizes; in fact we use powers of two and table doubling. When last I
checked the result was faster than Data.Map, but not by much, and
using strings probably wipes out that advantage vs. tries.
-Jan-Willem Maessen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2425 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20070618/ecb807d5/smime-0001.bin
More information about the Haskell-Cafe
mailing list