[Haskell-cafe] Re: Optimizing spelling correction program
Don Stewart
dons at galois.com
Mon Jun 22 17:22:46 EDT 2009
kamil:
> On Jun 22, 9:10 am, Ketil Malde <ke... at malde.org> wrote:
> > Kamil Dworakowski <ka... at dworakowski.name> writes:
> > > Right... Python uses hashtables while here I have a tree with log n
> > > access time. I did not want to use the Data.HashTable, it would
> > > pervade my program with IO. The alternative is an ideal hashmap that never
> > > gets changed. This program creates a dictionary at start which then is only
> > > being used to read from: an ideal application for the Data.PerfectHash
> > > by Mark Wotton available on Hackage [3].
> >
> > If you are considering alternative data structures, you might want to
> > look at tries or Bloom filters, both have O(n) lookup, both have
> > Haskell implementations. The latter is probably faster but
> > probabilistic (i.e. it will occasionally fail to detect a
> > misspelling - which you can of course check against a "real"
> > dictionary).
>
> Using Bryan O'Sullivan's fantastic BloomFilter I got it down below
> Python's run time! Now it is 35.56s, 28% of the time is spent on GC,
> which I think means there is still some room for improvement.
One easy way to fix the GC time is to increase the default heap size.
./a.out +RTS -A200M
for example.
More information about the Haskell-Cafe
mailing list