[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling
Corrector
Albert Y. C. Lai
trebla at vex.net
Mon Apr 23 20:51:21 EDT 2007
Albert Y. C. Lai wrote:
> I try using WordSet = [String] (plus corresponding change in code) and
> get great speedup, actually way more than 3x. There was also a memory
> growth phenomenon using Set String, and replacement by [String] stops
> that too, now it's constant space (constant = 20M). It is possible to
> attribute part of the speedup to excellent rewrite rules in GHC
> regarding lists; however, I cannot explain the memory growth when using
> Set.
Now I see. The higher memory usage of the Set version is not growth or
leak, just brutal. For each input word, a ton of derived words are
generated, and an optimal one is chosen (by a criterion that's
essentially a mapping from words to scores). There are two ways to do this.
You could put the derived words into a lazy list and then traverse it,
and therefore they're generated, examined, and thrown away on demand. At
most two words are ever going to be stored (the best one so far and the
next candidate), maybe plus a couple of cons cells if the compiler does
not optimize.
Or you could put all derived words into a set first (to avoid
duplicates, but note that you still have to generate a ton, you only
save by storing just half a ton), then traverse it. Storing a ton, or
even just half a ton, of these words is going to clog up memory.
The perceived growth is only because some input words cause fewer
derived words and some other input words encountered later cause more
derived words. So, plotting memory over time, once in a while we have
more derived words than before, so the heap grows for that; they are
still thrown away correctly in due course. If you input the same input
word over and over gain, space usage is still constant.
More information about the Haskell-Cafe
mailing list