[Haskell-beginners] Disappointing thought on Haskell -- a simple space leak on "insert" is hard bug to catch
Heinrich Apfelmus
apfelmus at quantentunnel.de
Sat Aug 16 10:13:10 UTC 2014
Dimitri DeFigueiredo wrote:
> Here's a problem I just found with Haskell (and maybe functional
> programming) that I would like to know how to avoid.
>
> Please don't get me wrong. I'm in *LUUV* with Haskell! :-)
>
>
> Exercise 2.3 of Chris Okasaki's book "Purely Functional Data
> Structures" shows that something as simple as an insert function in a
> binary tree can cause a space leak in Haskell. (and other functional
> languages as well)
>
> Here's a simple tree type:
>
> data Tree a = Empty
> | MkTree (Tree a) a (Tree a)
>
> Here's the version of insert with the space leak:
>
> -- ** SPACE LEAK! ** version
> -- this version unnecessarily copies the path to the inserted element
> even if
> -- the same element is already present in the tree.
> -- a million insertions of the same element will cause a million trees
> -- to exist whereone would suffice.
>
> treeInsert :: Ord a => a -> Tree a -> Tree a
> treeInsert e Empty = MkTree Empty e Empty
> treeInsert e t@(MkTree a x b) | e > x = MkTree a x (treeInsert e b)
> | e < x = MkTree (treeInsert e a) x b
> | otherwise = t
>
> Here's a version without the space leak:
>
> treeInsert :: Ord a => a -> Tree a -> Tree a
> treeInsert e t = case (treeInsert' e t) of
> Nothing -> t
> Just x -> x
> where
> treeInsert' :: Ord a => a -> Tree a -> Maybe (Tree a)
> treeInsert' e Empty = return $ MkTree Empty e Empty
> treeInsert' e (MkTree a x b) | e > x = do {r <- treeInsert'
> e b; return $ MkTree a x r}
> | e < x = do {r <- treeInsert'
> e a; return $ MkTree r x b}
> | otherwise = Nothing
>
>
> This is so tricky that I think it is worth questioning whether
> Haskell is helping here. First, just spotting that the first version
> leads to a space leak is not trivial. Second, the fix is so much
> uglier than the original version. It is disappointing to have to
> write it!
>
> Questions:
> 1) ** Is there a warning sign that would easily flag this space leak? **
> 2) Is there a simpler fix?
The really tricky thing here is that the space leak is not what you
think it is. :)
When you use the tree in an ephemeral fashion, the issue is *not* that
the path to the element is being copied, because the old path is usually
not referenced anywhere, and can be garbage collected immediately.
Asymptotically, this has no impact on space usage.
Instead, the issue is that the tree operations are not "completed".
Inserting the same element a thousand times without inspecting the tree
afterwards, you will get a tree that contains a huge unevaluated
subtree, like
MkTree (treeInstert 'a' (treeInsert 'a' ...)) 'f' Empty
What your supposed fix actually does is that it fully evaluates the
subtrees as well.
An easier way to achieve this to use a "strictness annotation" in the
data type, which is done via the `!` symbol:
data Tree a = Empty
| MkTree !(Tree a) a !(Tree a)
This data type may not have any unevaluated subtrees. Try it and see
whether it solves your problem.
In general, lazy evaluation is a trade-off: It makes some programs
simpler to write, but the price is that the cost model becomes more
difficult, both for time and for space usage. Cost models that you may
know from eager languages do not apply anymore.
For reasoning about time usage, Okasaki introduces the "debit method",w
hich you've certainly read about. Another exposition can be found in [1].
For reasoning about space, I recommend an approach that I like to call
"space invariants" [2]. The key problem with the space leak above is
that the semantics of a tree -- which elements it contains -- no longer
coincides with the size of its representation -- an expression which may
be only partially evaluated. For things like infinite lists, e.g. [1..],
this is actually great! But for other things like finite maps, this may
lead to unexpected space usage. The remedy is to enforce a "space
invariant", which is a guarantee that links the semantics and the size
of the representation. An example would be "A tree in WHNF uses only as
much space as the leaves it contains (+ skeleton)". The annotion `!` is
one way to enforce these invariants.
[1]: http://apfelmus.nfshost.com/articles/debit-method.html
[2]: http://apfelmus.nfshost.com/blog/2013/08/21-space-invariants.html
Best regards,
Heinrich Apfelmus
--
http://apfelmus.nfshost.com
More information about the Beginners
mailing list