[Haskell-cafe] Hashtable woes

Chris Kuklewicz chris at mightyreason.com
Mon Jan 23 04:27:53 EST 2006


Hello,

  This post is a about a tangential issue to the shootout, not the
shootout itself.  It concerns the lack of a fast mutable hashtable.  If
you have a good replacement for Data.Hashtable, I would be interested to
know about it (and so would the wiki!).  The shootout entry
"k-nucleotide" is a good test of the hashtable provided by each of the
languages:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotide&lang=all

Several of us are building a proposed entry on the wiki at

http://haskell.org/hawiki/KnucleotideEntry

where I have just posted a plea which I will paste below:

On mutable data structures in Haskell

This benchmark is completely bottlenecked by Data.Hashtable (in GHC
6.4.1) and this discussion is based on the assumption that I am using
Hashtable optimally. I downloaded the GHD 0.17 compiler and the DMD
entry to benchmark on my machine. The DMD entry uses the "associative
arrays" built into the language: "int[char[]] frequencies" and places
[WWW]second (runs in 3.0s). The winning entry is interesting, since the
c-code does not have a hash table, and so it uses #include
"../../Include/simple_hash.h" which pulls in a dead simple, inline,
string to int hashtable and runs in 2s.

The entry below runs 16 slower than the DMD entry on my powerbook G4.
Profiling shows the bottleneck. I downloaded simple_hash.h and
shamelessly optimized it to replace Data.Hashtable for exactly this
benchmark code. This sped up the proposed entry by a factor of 4.1 so
that it is now about a factor of 4 slower than the DMD entry. This shows
that Data.Hashtable is doing *at least* four times more work that is
needed for this usage pattern. And even with my over specialized hash
table, Haskell is almost 50% slower than OCaml's "module H =
Hashtbl.Make(S)" (note that I my code uses the same hash function as
OCaml). Unfortunately I cannot submit this optimized hash table entry to
the shootout.

The only mutable data structure that come with GHC besides arrays is
Data.Hashtable, which is not comptetitive with OCaml Hashtbl or DMD's
associative arrays (unless there is something great hidden under
Data.Graph). Is there any hope for GHC 6.6? Does anyone have pointers to
an existing library at all? Perl and Python and Lua also have excellent
built in hashtable capabilities. Where is a good library for Haskell?

-- 
Chris Kuklewicz



More information about the Haskell-Cafe mailing list