[Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

Bernie Pope bjpop at csse.unimelb.edu.au
Tue Feb 13 16:14:48 EST 2007

```Duncan Coutts wrote:
> On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
>
>> Hi, I am running the following code against a 210 MB file in an attempt to
>> determine whether I should use alex or whether, since my needs are very
>> performance oriented, I should write a lexer of my own.  I thought that
>> everything I'd written here was tail-recursive
>>
>
> Isn't that exactly the problem - that it's tail recursive? You do not
> want it to be tail recursive since then it must consume the whole input
> before producing any output. You want it to be as lazy as possible so
> that it can start producing tokens as soon as possible without having to
> consume everything.
>

Duncan is right, and I will just elaborate a little bit.

Consider the pass1 function:

pass1 :: String -> String -> String
pass1 left [] = left
pass1 left ('<':right) = pass1 left (stripTagOrComment right)
pass1 left (' ':right) = pass1 left right
pass1 left (c:right)
| Set.member c punct = pass1 (' ':c:' ':left) right
| otherwise          = pass1 (c:left) right

It accumulates its result in the "left" parameter. So it chomps down the
"right" string building up a bigger and bigger solution until it reaches
the base case, and hands the solution over to the calling function.

The calling function gets nothing back from pass1 until pass1 has
processed the whole input. And that accumulated solution in "left" could
grow quite big.

A much better approach would be:

pass1 :: String -> String
pass1 [] =  []
pass1 ('<':right) = pass1 (stripTagOrComment right)
pass1 (' ':right) = pass1 right
pass1 (c:right)
| Set.member c punct = ' ':c:' ': pass1 right
| otherwise          = c : pass1 right

This way, pass1 will be producing output as early as possible, which can
be consumed earlier by the calling function. Lazy evaluation gives you a
kind of co-routining between producers and consumers, but you have to
write "good" producers and "good" consumers.

You should also write the pass2 in this style as well. Your memory
consumption should drop to something very small.

Cheers,
Bernie.
```