[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling
Corrector
Pete Kazmier
pete-expires-20070615 at kazmier.com
Sun Apr 22 11:51:02 EDT 2007
Ketil Malde <ketil at ii.uib.no> writes:
> On Sun, 2007-04-22 at 00:25 -0400, Pete Kazmier wrote:
>> Pete Kazmier <pete-expires-20070615 at kazmier.com> writes:
>>
>> > I'd love to see other Haskell implementations as well if anyone has a
>> > few moments to spare. Admittedly, it took me several hours to get my
>> > version working, but I'm a Haskell newbie. Unfortunately, I think it
>> > runs as slow as it took me to write it!
>
> Hm - nobody suggested using ByteStrings yet? String is notoriously
> wasteful, and replacing it with ByteString normally gives a quite
> significant speedup.
I actually have a ByteString version but it runs much slower. This
part of the code is where all of the time is spent in the ByteString
version:
type WordFreq = M.Map B.ByteString Int
train:: [B.ByteString] -> WordFreq
train words = frequencyMap
where
frequencyMap = foldr incWordCount M.empty words
incWordCount w m = M.insertWith (+) w 1 m
> Worse - and this is true for ByteStrings, too - String comparisons are
> O(n), which means lookups in Sets and Maps are expensive. A trie (i.e,
> a search tree where each internal node corresponds to a word prefix, and
> has one branch per letter in the alphabet) will give you lookup that
> scales with word size (and possibly alphabet size).
Right. My first version was just a direct translation of Norvig's
code with an emphasis on trying to keep the complexity and size of
code to a minimum.
> Instead of generating the (huge) list of misspelled words, you could
> calculate edit distance to each correctly spelled word? With a bound on
> allowable mistakes, this is a linear time operation using the standard
> dynamic programming algorithm.
Could you provide additional information on this "standard dynamic
programming algorithm?" I'm not familiar with dynamic programming.
Thanks!
Pete
More information about the Haskell-Cafe
mailing list