[Haskell-beginners] Disappointing thought on Haskell -- a simple space leak on "insert" is hard bug to catch

Heinrich Apfelmus apfelmus at quantentunnel.de
Tue Aug 19 14:42:22 UTC 2014


Dimitri DeFigueiredo wrote:
> Hi Heinrich,
> 
> Thanks so much! :-)
> 
> Wow! I was totally wrong on that one then. I got it that lazy evaluation 
> is the culprit here. Now, I am still a little uneasy about insert 
> producing a new version of the tree when it is not needed. Can't that 
> come back to bite me later depending on how I use it?
> 
> In your explanation, you say that "the old path is usually not 
> referenced anywhere, and can be garbage collected immediately."
> I am a little uneasy about the "usually". Could you expand on that? What 
> if it is? Could that happen? Or does that just imply a bigger constant?

First, note that a new version of the tree has to be created whenever 
the element you insert was not in the tree. Most likely, your 
"higher-up" algorithm that makes use of the tree structure doesn't treat 
these two cases too differently, so you probably have to think about the 
case where the tree is copied anyway.

Second, you may want to look up on how lazy evaluation, respectively 
*graph reduction* work. This will answer your questions. For instance, 
the new code that you wrote implicitly builds a new version of the tree 
as well (building a tower of expressions of the form `>>= \r -> return $ 
MkTree a x r`) but discards it level by level when it turns out that the 
end result is `Nothing`. Without an understanding of the evaluation 
model, you will always feel uneasy.

The basic tenet of garbage collection is that the binding for any name 
which does not occur in the current expression can be discard. For 
instance, in the expression

    let x = "A String which uses a lot of memory"
        y = 12
    in y + 2

we can safely discard the name `x` and its value, as the name `x` does 
not occur in the expression `y + 2`.

But in the expression,

    let x = "A String which uses a lot of memory"
        y = 12
    in snd (x,y+2)

we may not discard `x` just yet, because it still occurs in the 
expression `snd (x,y+2)`. Of course, performing reduction step will turn 
this into `y+2` and now the memory used by binding for `x` can be freed.

Note that the runtime may choose to wait some time before freeing the 
associated memory. In fact, at periodic intervals, the runtime will stop 
the current evaluation and instead perform a process called "garbage 
collection", where it looks for all variable names and frees the memory 
of those that are no longer needed.

A "space leak" generally refers to the situation where a name is no 
longer needed but still occurs in the expression to be evaluated. This 
is very much like the `x` in the second example: it's "not really 
needed", because we humans know that `snd` doesn't use it, but it's 
still written there, so the runtime can't throw it away.


Best regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com



More information about the Beginners mailing list