[Haskell-beginners] Disappointing thought on Haskell -- a simple space leak on "insert" is hard bug to catch
Dimitri DeFigueiredo
defigueiredo at ucdavis.edu
Mon Aug 18 23:58:44 UTC 2014
Hi Heinrich,
Thanks so much! :-)
Wow! I was totally wrong on that one then. I got it that lazy evaluation
is the culprit here. Now, I am still a little uneasy about insert
producing a new version of the tree when it is not needed. Can't that
come back to bite me later depending on how I use it?
In your explanation, you say that "the old path is usually not
referenced anywhere, and can be garbage collected immediately."
I am a little uneasy about the "usually". Could you expand on that? What
if it is? Could that happen? Or does that just imply a bigger constant?
Thanks again,
Dimitri
Em 16/08/14 04:13, Heinrich Apfelmus escreveu:
> Dimitri DeFigueiredo wrote:
>> Here's a problem I just found with Haskell (and maybe functional
>> programming) that I would like to know how to avoid.
>>
>> Please don't get me wrong. I'm in *LUUV* with Haskell! :-)
>>
>>
>> Exercise 2.3 of Chris Okasaki's book "Purely Functional Data
>> Structures" shows that something as simple as an insert function in a
>> binary tree can cause a space leak in Haskell. (and other functional
>> languages as well)
>>
>> Here's a simple tree type:
>>
>> data Tree a = Empty
>> | MkTree (Tree a) a (Tree a)
>>
>> Here's the version of insert with the space leak:
>>
>> -- ** SPACE LEAK! ** version
>> -- this version unnecessarily copies the path to the inserted element
>> even if
>> -- the same element is already present in the tree.
>> -- a million insertions of the same element will cause a million trees
>> -- to exist whereone would suffice.
>>
>> treeInsert :: Ord a => a -> Tree a -> Tree a
>> treeInsert e Empty = MkTree Empty e Empty
>> treeInsert e t@(MkTree a x b) | e > x = MkTree a x (treeInsert e b)
>> | e < x = MkTree (treeInsert e a) x b
>> | otherwise = t
>>
>> Here's a version without the space leak:
>>
>> treeInsert :: Ord a => a -> Tree a -> Tree a
>> treeInsert e t = case (treeInsert' e t) of
>> Nothing -> t
>> Just x -> x
>> where
>> treeInsert' :: Ord a => a -> Tree a -> Maybe (Tree a)
>> treeInsert' e Empty = return $ MkTree Empty e Empty
>> treeInsert' e (MkTree a x b) | e > x = do {r <-
>> treeInsert' e b; return $ MkTree a x r}
>> | e < x = do {r <-
>> treeInsert' e a; return $ MkTree r x b}
>> | otherwise = Nothing
>>
>>
>> This is so tricky that I think it is worth questioning whether
>> Haskell is helping here. First, just spotting that the first version
>> leads to a space leak is not trivial. Second, the fix is so much
>> uglier than the original version. It is disappointing to have to
>> write it!
>>
>> Questions:
>> 1) ** Is there a warning sign that would easily flag this space leak? **
>> 2) Is there a simpler fix?
>
> The really tricky thing here is that the space leak is not what you
> think it is. :)
>
> When you use the tree in an ephemeral fashion, the issue is *not* that
> the path to the element is being copied, because the old path is
> usually not referenced anywhere, and can be garbage collected
> immediately. Asymptotically, this has no impact on space usage.
>
> Instead, the issue is that the tree operations are not "completed".
> Inserting the same element a thousand times without inspecting the
> tree afterwards, you will get a tree that contains a huge unevaluated
> subtree, like
>
> MkTree (treeInstert 'a' (treeInsert 'a' ...)) 'f' Empty
>
> What your supposed fix actually does is that it fully evaluates the
> subtrees as well.
>
> An easier way to achieve this to use a "strictness annotation" in the
> data type, which is done via the `!` symbol:
>
> data Tree a = Empty
> | MkTree !(Tree a) a !(Tree a)
>
> This data type may not have any unevaluated subtrees. Try it and see
> whether it solves your problem.
>
>
> In general, lazy evaluation is a trade-off: It makes some programs
> simpler to write, but the price is that the cost model becomes more
> difficult, both for time and for space usage. Cost models that you may
> know from eager languages do not apply anymore.
>
> For reasoning about time usage, Okasaki introduces the "debit
> method",w hich you've certainly read about. Another exposition can be
> found in [1].
>
> For reasoning about space, I recommend an approach that I like to call
> "space invariants" [2]. The key problem with the space leak above is
> that the semantics of a tree -- which elements it contains -- no
> longer coincides with the size of its representation -- an expression
> which may be only partially evaluated. For things like infinite lists,
> e.g. [1..], this is actually great! But for other things like finite
> maps, this may lead to unexpected space usage. The remedy is to
> enforce a "space invariant", which is a guarantee that links the
> semantics and the size of the representation. An example would be "A
> tree in WHNF uses only as much space as the leaves it contains (+
> skeleton)". The annotion `!` is one way to enforce these invariants.
>
>
> [1]: http://apfelmus.nfshost.com/articles/debit-method.html
> [2]: http://apfelmus.nfshost.com/blog/2013/08/21-space-invariants.html
>
>
> Best regards,
> Heinrich Apfelmus
>
> --
> http://apfelmus.nfshost.com
>
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
More information about the Beginners
mailing list