[Haskell-cafe] Re: Optimizing spelling correction program

Kamil Dworakowski kamil at dworakowski.name
Mon Jun 22 16:27:09 EDT 2009


On Jun 22, 9:10 am, Ketil Malde <ke... at malde.org> wrote:
> Kamil Dworakowski <ka... at dworakowski.name> writes:
> > Right... Python uses hashtables while here I have a tree with log n
> > access time. I did not want to use the Data.HashTable, it would
> > pervade my program with IO. The alternative is an ideal hashmap that never
> > gets changed. This program creates a dictionary at start which then is only
> > being used to read from: an ideal application for the Data.PerfectHash
> > by Mark Wotton available on Hackage [3].
>
> If you are considering alternative data structures, you might want to
> look at tries or Bloom filters, both have O(n) lookup, both have
> Haskell implementations.  The latter is probably faster but
> probabilistic (i.e. it will occasionally fail to detect a
> misspelling - which you can of course check against a "real"
> dictionary).

Using Bryan O'Sullivan's fantastic BloomFilter I got it down below
Python's run time! Now it is 35.56s, 28% of the time is spent on GC,
which I think means there is still some room for improvement.

Do you say that PerfectHash runs with a penalty of calling into c
library?


More information about the Haskell-Cafe mailing list