[Haskell-beginners] Disappointing thought on Haskell -- a simple space leak on "insert" is hard bug to catch

Mon Aug 18 23:58:44 UTC 2014

Hi Heinrich,

Thanks so much! :-)

Wow! I was totally wrong on that one then. I got it that lazy evaluation 
is the culprit here. Now, I am still a little uneasy about insert 
producing a new version of the tree when it is not needed. Can't that 
come back to bite me later depending on how I use it?

In your explanation, you say that "the old path is usually not 
referenced anywhere, and can be garbage collected immediately."
I am a little uneasy about the "usually". Could you expand on that? What 
if it is? Could that happen? Or does that just imply a bigger constant?

Thanks again,

Dimitri

Em 16/08/14 04:13, Heinrich Apfelmus escreveu:
> Dimitri DeFigueiredo wrote:
>> Here's a problem I just found with Haskell (and maybe functional 
>> programming) that I would like to know how to avoid.
>>
>> Please don't get me wrong. I'm in *LUUV* with Haskell! :-)
>>
>>
>> Exercise 2.3 of Chris Okasaki's book "Purely Functional Data
>> Structures" shows that something as simple as an insert function in a
>> binary tree can cause a space leak in Haskell. (and other functional
>> languages as well)
>>
>> Here's a simple tree type:
>>
>> data Tree a = Empty
>>             | MkTree (Tree a) a (Tree a)
>>
>> Here's the version of insert with the space leak:
>>
>> -- ** SPACE LEAK! ** version
>> -- this version unnecessarily copies the path to the inserted element 
>> even if
>> -- the same element is already present in the tree.
>> -- a million insertions of the same element will cause a million trees
>> -- to exist whereone would suffice.
>>
>> treeInsert :: Ord a => a -> Tree a -> Tree a
>> treeInsert e Empty = MkTree Empty e Empty
>> treeInsert e t@(MkTree a x b) | e > x     = MkTree a x (treeInsert e b)
>>                               | e < x     = MkTree (treeInsert e a) x b
>>                               | otherwise = t
>>
>> Here's a version without the space leak:
>>
>> treeInsert :: Ord a => a -> Tree a -> Tree a
>> treeInsert e t = case (treeInsert' e t) of
>>                     Nothing -> t
>>                     Just x  -> x
>>     where
>>         treeInsert' :: Ord a => a -> Tree a -> Maybe (Tree a)
>>         treeInsert' e Empty = return $ MkTree Empty e Empty
>>         treeInsert' e (MkTree a x b) | e > x     = do {r <- 
>> treeInsert' e b; return $ MkTree a x r}
>>                                      | e < x     = do {r <- 
>> treeInsert' e a; return $ MkTree r x b}
>>                                      | otherwise = Nothing
>>
>>
>> This is so tricky that I think it is worth questioning whether
>> Haskell is helping here. First, just spotting that the first version
>> leads to a space leak is not trivial. Second, the fix is so much
>> uglier than the original version. It is disappointing to have to
>> write it!
>>
>> Questions:
>> 1) ** Is there a warning sign that would easily flag this space leak? **
>> 2) Is there a simpler fix?
>
> The really tricky thing here is that the space leak is not what you 
> think it is. :)
>
> When you use the tree in an ephemeral fashion, the issue is *not* that 
> the path to the element is being copied, because the old path is 
> usually not referenced anywhere, and can be garbage collected 
> immediately. Asymptotically, this has no impact on space usage.
>
> Instead, the issue is that the tree operations are not "completed". 
> Inserting the same element a thousand times without inspecting the 
> tree afterwards, you will get a tree that contains a huge unevaluated 
> subtree, like
>
>   MkTree (treeInstert 'a' (treeInsert 'a' ...)) 'f' Empty
>
> What your supposed fix actually does is that it fully evaluates the 
> subtrees as well.
>
> An easier way to achieve this to use a "strictness annotation" in the 
> data type, which is done via the `!` symbol:
>
>     data Tree a = Empty
>                 | MkTree !(Tree a) a !(Tree a)
>
> This data type may not have any unevaluated subtrees. Try it and see 
> whether it solves your problem.
>
>
> In general, lazy evaluation is a trade-off: It makes some programs 
> simpler to write, but the price is that the cost model becomes more 
> difficult, both for time and for space usage. Cost models that you may 
> know from eager languages do not apply anymore.
>
> For reasoning about time usage, Okasaki introduces the "debit 
> method",w hich you've certainly read about. Another exposition can be 
> found in [1].
>
> For reasoning about space, I recommend an approach that I like to call 
> "space invariants" [2]. The key problem with the space leak above is 
> that the semantics of a tree -- which elements it contains -- no 
> longer coincides with the size of its representation -- an expression 
> which may be only partially evaluated. For things like infinite lists, 
> e.g. [1..], this is actually great! But for other things like finite 
> maps, this may lead to unexpected space usage. The remedy is to 
> enforce a "space invariant", which is a guarantee that links the 
> semantics and the size of the representation. An example would be "A 
> tree in WHNF uses only as much space as the leaves it contains (+ 
> skeleton)". The annotion `!` is one way to enforce these invariants.
>
>
>   [1]: http://apfelmus.nfshost.com/articles/debit-method.html
>   [2]: http://apfelmus.nfshost.com/blog/2013/08/21-space-invariants.html
>
>
> Best regards,
> Heinrich Apfelmus
>
> -- 
> http://apfelmus.nfshost.com
>
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners