[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

Albert Y. C. Lai trebla at vex.net
Sun Apr 22 01:35:50 EDT 2007


I try using WordSet = [String] (plus corresponding change in code) and 
get great speedup, actually way more than 3x. There was also a memory 
growth phenomenon using Set String, and replacement by [String] stops 
that too, now it's constant space (constant = 20M). It is possible to 
attribute part of the speedup to excellent rewrite rules in GHC 
regarding lists; however, I cannot explain the memory growth when using Set.

Regarding the local WordFreq map under "train", I am shocked that ghc -O 
is smart enough to notice it and perform proper sharing, and only one 
copy is ever created. Nonetheless, I still decide to factor "train" into 
two, one builds the WordFreq and the other queries it, which eases blame 
analysis when necessary.

On the interact line, I use "tokens" to break up the input, since it's 
already written (for the trainer), may as well reuse it.

When reading holmes.txt, be aware that it is in UTF-8, while GHC still 
assumes ISO-8859-1. This will affect results.

I have not checked the correctness of edits1.

I am monochrom.

My modification is attached.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: spell2.hs
Type: text/x-haskell
Size: 1988 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20070422/02d4f780/spell2-0001.bin


More information about the Haskell-Cafe mailing list