[Haskell-cafe] puzzling memory leak? (GHC)

Tue Oct 11 02:34:12 EDT 2005

On Oct 9, 2005, at 11:32 PM, Ben Lippmeier wrote:

> This sounds like a text-book space-leak caused by lazy evaluation.
>
> In a lazy language, function evaluation is driven by the need to  
> perform IO. Because the first version of your program never prints  
> the parsed structure, the parser doesn't completely parse it. This  
> implies that the system needs to hang onto all the input data (as  
> well as all the partially evaluated calls to the parser) incase it  
> needs it later on.
>

I naively believed (or hoped) that, so long as I the programmer took  
some care, the evaluator would be smart enough to discard unaccessed  
computations and values and thus avoid a space leak.  Hence the thing  
I worried about was explicit references to data structures (say, the  
head of very long lists) that prevent objects from being reclaimed.   
Now it seems I need to also worry about the hidden aspects of the  
internal implementation of lazy evaluation.

What intuition should be my guide about the "proper way" of employing  
lazy evaluation?  It's not yet clear to me what you mean by "the  
system needs to hang onto all the input data ...".  It seems  
counterintuitive for the evaluator to hang onto partially evaluated  
functions and input data when it can be proven (through static  
analysis) that references to that data can no longer exist in the  
remainder of the program execution; for example, consider

   case head $ drop 500000 $ parseArts ... of
            Right x -> hPutArts stderr x

in which the first 500,000 elements in the result of parseArts can't  
ever be referenced after 'drop' executes, and yet the space leak  
still happens (presumably while trying to skip over those 500,000  
elements).  Have I understood you correctly that this is the  
unfortunate behavior of lazy evaluation?

> The default string representation is Haskell is pretty abysmal, and  
> having it use hundreds of megs to represent, say a 10 meg file is  
> not too surprising.
>

Each file I process is 70MB, so inefficient representation could be a  
problem if input data is buffered.

> My advice is that if you don't want to fool around with it, just  
> swallow hard, then change fixArts to do a hPutArts to /dev/null.
>
> Either that or
>   1) go read about the DeepSeq library.
>   2) add strictness annotations to your structure definition.
>

Thanks for the advice; I'll look into both.

  --Young