Stack leak... (?)

11 Jun 2001 11:20:54 -0600

Juan Carlos Arevalo Baeza <jcab@roningames.com> writes:
>     Ok... I'm doing a lot of testing with monadic
> parsers. My current test consists on a Haskell program that
> parses a C++ source file and:
> [...]
>     The problem is that my test program runs out of stack
> (not heap) on both GHC and Hugs when parsing a large-enough
> file (about 150K for GHC, a lot less for Hugs). 

The classic cause of this problem in parsers is tracking line numbers.
The problem is that line numbers have to be built but rarely get
observed.  This means that a line number like 33 will be represented
by a thunk like this:

   1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1

This uses a lot of heap space (though it may not be too bad because of
sharing) but, more seriously, it takes stack space proportional to the
line number - hence the stack overflows you've been seeing.

The fix is to add strictness annotations to force evaluation of the
line number every time you increment it.  The easiest way to do this
is to represent source locations by a data structure like this:

  data SrcLoc = MkSrcLoc FileName !Int !Int

and use "seq" (or "$!") in the function that gets the next token.
For example, if the function to get the next token looks like this:

  token :: Parser a
  token = Parser (\ (input, srcloc) ->
    case input of 
      (t : new_input) -> return (a, (new_input, inc srcloc))
      ...
      )

you would change the 4th line to:

      (t : new_input) -> let srcloc' = inc srcloc 
                         in srcloc' `seq` return (a, (new_input, srcloc'))

[A similar problem can happen in pretty-printers that track column
numbers.]

-- 
Alastair Reid        reid@cs.utah.edu        http://www.cs.utah.edu/~reid/

ps You might want to have a look at this Prelude function:

     unlines :: [String] -> String