[Haskell-beginners] Disappointing thought on Haskell -- a simple space leak on "insert" is hard bug to catch

Sat Aug 16 10:13:10 UTC 2014

Dimitri DeFigueiredo wrote:
> Here's a problem I just found with Haskell (and maybe functional 
> programming) that I would like to know how to avoid.
> 
> Please don't get me wrong. I'm in *LUUV* with Haskell! :-)
> 
> 
> Exercise 2.3 of Chris Okasaki's book "Purely Functional Data
> Structures" shows that something as simple as an insert function in a
> binary tree can cause a space leak in Haskell. (and other functional
> languages as well)
> 
> Here's a simple tree type:
> 
> data Tree a = Empty
>             | MkTree (Tree a) a (Tree a)
> 
> Here's the version of insert with the space leak:
> 
> -- ** SPACE LEAK! ** version
> -- this version unnecessarily copies the path to the inserted element 
> even if
> -- the same element is already present in the tree.
> -- a million insertions of the same element will cause a million trees
> -- to exist whereone would suffice.
> 
> treeInsert :: Ord a => a -> Tree a -> Tree a
> treeInsert e Empty = MkTree Empty e Empty
> treeInsert e t@(MkTree a x b) | e > x     = MkTree a x (treeInsert e b)
>                               | e < x     = MkTree (treeInsert e a) x b
>                               | otherwise = t
> 
> Here's a version without the space leak:
> 
> treeInsert :: Ord a => a -> Tree a -> Tree a
> treeInsert e t = case (treeInsert' e t) of
>                     Nothing -> t
>                     Just x  -> x
>     where
>         treeInsert' :: Ord a => a -> Tree a -> Maybe (Tree a)
>         treeInsert' e Empty = return $ MkTree Empty e Empty
>         treeInsert' e (MkTree a x b) | e > x     = do {r <- treeInsert' 
> e b; return $ MkTree a x r}
>                                      | e < x     = do {r <- treeInsert' 
> e a; return $ MkTree r x b}
>                                      | otherwise = Nothing
> 
> 
> This is so tricky that I think it is worth questioning whether
> Haskell is helping here. First, just spotting that the first version
> leads to a space leak is not trivial. Second, the fix is so much
> uglier than the original version. It is disappointing to have to
> write it!
> 
> Questions:
> 1) ** Is there a warning sign that would easily flag this space leak? **
> 2) Is there a simpler fix?

The really tricky thing here is that the space leak is not what you 
think it is. :)

When you use the tree in an ephemeral fashion, the issue is *not* that 
the path to the element is being copied, because the old path is usually 
not referenced anywhere, and can be garbage collected immediately. 
Asymptotically, this has no impact on space usage.

Instead, the issue is that the tree operations are not "completed". 
Inserting the same element a thousand times without inspecting the tree 
afterwards, you will get a tree that contains a huge unevaluated 
subtree, like

   MkTree (treeInstert 'a' (treeInsert 'a' ...)) 'f' Empty

What your supposed fix actually does is that it fully evaluates the 
subtrees as well.

An easier way to achieve this to use a "strictness annotation" in the 
data type, which is done via the `!` symbol:

     data Tree a = Empty
                 | MkTree !(Tree a) a !(Tree a)

This data type may not have any unevaluated subtrees. Try it and see 
whether it solves your problem.

In general, lazy evaluation is a trade-off: It makes some programs 
simpler to write, but the price is that the cost model becomes more 
difficult, both for time and for space usage. Cost models that you may 
know from eager languages do not apply anymore.

For reasoning about time usage, Okasaki introduces the "debit method",w 
hich you've certainly read about. Another exposition can be found in [1].

For reasoning about space, I recommend an approach that I like to call 
"space invariants" [2]. The key problem with the space leak above is 
that the semantics of a tree -- which elements it contains -- no longer 
coincides with the size of its representation -- an expression which may 
be only partially evaluated. For things like infinite lists, e.g. [1..], 
this is actually great! But for other things like finite maps, this may 
lead to unexpected space usage. The remedy is to enforce a "space 
invariant", which is a guarantee that links the semantics and the size 
of the representation. An example would be "A tree in WHNF uses only as 
much space as the leaves it contains (+ skeleton)". The annotion `!` is 
one way to enforce these invariants.

   [1]: http://apfelmus.nfshost.com/articles/debit-method.html
   [2]: http://apfelmus.nfshost.com/blog/2013/08/21-space-invariants.html

Best regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com