[Haskell-beginners] Fwd: Implementing a spellchecker - problem with Data.HashTable performance

Chaddaï Fouché chaddai.fouche at gmail.com
Mon Apr 23 17:38:37 CEST 2012

On Sun, Apr 22, 2012 at 4:58 PM, Radosław Szymczyszyn <lavrin at gmail.com> wrote:
> Thank you all for thoughts and suggestions. I've been tracking them as
> they appeared, but being busy with university assignments, couldn't
> try them out yet.
> In the meantime, however, Karol Samborski investigated the issue
> further and found the cause of poor performance -- somehow using
> HashTable.newHint doesn't play well with HashTable.update. Simply
> changing newHint to hint gives me execution time of about 12s (on my
> rather slowish AMD Fusion 1.6GHz laptop) with a dataset of ~1.35mln
> words.

While it's good that you have found a bug that explain the very poor
performance you got, I must reiterate Brent Yorgey's advice to use
Data.HashMap from unordered-containers or some other performance
oriented implementation : Data.HashTable is a legacy datastructure
that was mainly introduced to have a hashtable in Haskell core
libraries, it ran into some performance problem of GHC (only reduced
in very recent release I believe) and was never written to be very
fast in the first place... Avoiding it is generally a very good idea !
Using another structure (HashMap or some trie implementation) would
probably get you much better performance without hassle. I must admit
that having Data.HashTable in the base library without warning as to
its unsuitability is very misleading to the beginner coming from Perl
or Python.


More information about the Beginners mailing list