[Haskell-beginners] Disappointing thought on Haskell -- a simple space leak on "insert" is hard bug to catch
Heinrich Apfelmus
apfelmus at quantentunnel.de
Tue Aug 19 14:42:22 UTC 2014
Dimitri DeFigueiredo wrote:
> Hi Heinrich,
>
> Thanks so much! :-)
>
> Wow! I was totally wrong on that one then. I got it that lazy evaluation
> is the culprit here. Now, I am still a little uneasy about insert
> producing a new version of the tree when it is not needed. Can't that
> come back to bite me later depending on how I use it?
>
> In your explanation, you say that "the old path is usually not
> referenced anywhere, and can be garbage collected immediately."
> I am a little uneasy about the "usually". Could you expand on that? What
> if it is? Could that happen? Or does that just imply a bigger constant?
First, note that a new version of the tree has to be created whenever
the element you insert was not in the tree. Most likely, your
"higher-up" algorithm that makes use of the tree structure doesn't treat
these two cases too differently, so you probably have to think about the
case where the tree is copied anyway.
Second, you may want to look up on how lazy evaluation, respectively
*graph reduction* work. This will answer your questions. For instance,
the new code that you wrote implicitly builds a new version of the tree
as well (building a tower of expressions of the form `>>= \r -> return $
MkTree a x r`) but discards it level by level when it turns out that the
end result is `Nothing`. Without an understanding of the evaluation
model, you will always feel uneasy.
The basic tenet of garbage collection is that the binding for any name
which does not occur in the current expression can be discard. For
instance, in the expression
let x = "A String which uses a lot of memory"
y = 12
in y + 2
we can safely discard the name `x` and its value, as the name `x` does
not occur in the expression `y + 2`.
But in the expression,
let x = "A String which uses a lot of memory"
y = 12
in snd (x,y+2)
we may not discard `x` just yet, because it still occurs in the
expression `snd (x,y+2)`. Of course, performing reduction step will turn
this into `y+2` and now the memory used by binding for `x` can be freed.
Note that the runtime may choose to wait some time before freeing the
associated memory. In fact, at periodic intervals, the runtime will stop
the current evaluation and instead perform a process called "garbage
collection", where it looks for all variable names and frees the memory
of those that are no longer needed.
A "space leak" generally refers to the situation where a name is no
longer needed but still occurs in the expression to be evaluated. This
is very much like the `x` in the second example: it's "not really
needed", because we humans know that `snd` doesn't use it, but it's
still written there, so the runtime can't throw it away.
Best regards,
Heinrich Apfelmus
--
http://apfelmus.nfshost.com
More information about the Beginners
mailing list