[Haskell-beginners] Fwd: Implementing a spellchecker - problem with Data.HashTable performance

Fri Apr 20 18:03:19 CEST 2012

Thanks for your suggestions. Alas, they don't solve the problem.

As I was at work without the original data file, I repeated the test
suggested by Karol Samborski with a file of 1 400 000 repetitions of
"żyźniejszymi". It took about 3.5s, so I thought my problem had been
solved. However, repeating it with -O2 makes a difference of ~2-3s and
I don't believe my laptop I used at home is *that much slower* than my
Mac at work, that running without optimization would make such a great
difference.

Now, I've just rerun the test run with the original data file (still
at work, so comparison with 3.5s is appropriate) at 17:26 and it's
still running -- so the problem lies in the data set being hashed. I
don't know why, but it seems to:
- either make a difference whether one specific or many different
words are hashed,
- or whether it's just one slot or many of the HashTable being updated
(but as I'm using newHint the space should be preallocated).

Either way I would be grateful if you Karol or somebody else could
rerun the test with the original data. It's available at:
http://ernie.icslab.agh.edu.pl/~lavrin/formy.utf8.gz

Thanks for your time!

Regards,
Radek Szymczyszyn