[Haskell-cafe] The dark side of lazyness - memory leak

Mon Aug 7 17:49:46 EDT 2006

I'm facing the dark side of lazyness recently.

Typical pattern is like this.

My code was working fine and I was happy.
I just wanted to inspect some properties of my code so
I made a slight chane go the code such as adding counter
argument or attaching axulary data filed to original data for
tracing how the data has been constructed.
All of a sudden the program runs out of memory or overflows
the stack.

One problem is that it comes up unexptectedly. Another even
worse problem is that sometimes I get no idea for the exact
loation causing the leak!

It really panics facing such darkness of lazy evaluation.
Just a small innocent looking fix for inspection or tracing
blow things up, sometime with no clue for its reason.

When I put a debugging or tracing operating in my software
that can be toggled, how could I be sure that turning on
those features won't crash my software written in Haskell?

Are there appraoches to detect and fix these kind of problem?

Haskell may be type safe but not safe at all from unexpteded
diversion, not because of the programmers' mistake but just
because of the lazyness.

I have posted an wiki article including one example of this dark
side of lazyness I encountered when I tried to count the number
of basic operations in sorting algorithm.
This was a rather simple situation and we figured out how to
cure this by self equality check ( x==x ) forcing evaluation.

There are wose cases not being able to figure out the cure.
I wrote a fucntion for analyzing some property of a graph,
which worked fine.

fixOnBy t p f x = if t x' `p` t x then x else fixOnBy t p f x' where x' = f x

fixSize f x = fixOnBy Set.size (==) f x

sctAnal gs = null cgs || all (not . null) dcs
  where
    gs' = fixSize compose $ Set.fromList [(x,y,cs) | To _ x y cs<-Set.toList gs]
    cgs = [z | z@(x,y,cs)<-Set.toList gs', x==y]
    dcs = [ [c| c@(a,D,b)<-Set.toList cs , a==b] | (_,_,cs)<-cgs]
    compose gs = trace ("## "++show (Set.size gs)) $ foldr Set.insert gs $ do
      (x1,y1,cs1) <- Set.toList gs
      (_,y2,cs2)  <-  takeWhileFst y1 $ Set.toList $ setGT
(y1,Al""(-1),Set.empty) gs
      return (x1,y2,cs1 `comp` cs2)
    takeWhileFst y = takeWhile (\(y',_,_) -> y==y')

This fucntion makes a transitive closure of the given set of relations
by fixpoint iteration on the size of the set of weighted edegs.

Sample output is like this.

*Main> main
## 170
## 400
## 1167
## 2249
## 2314
False

When I add an extra data field for tracing how the new item was made
(e.g. tag [a,b,c] on a->c if it was generated by a->b and b->c)
It suddenly overflows the stack even before printing out the trace.
The following is the code that leaks memory.

sctAnal gs = null cgs || all (not . null) dcs
  where
    gs' = fixSize compose $ Set.fromList [TT (x,y,cs) [] | To _ x y
cs<-Set.toList gs]
    cgs = [z | z@(TT (x,y,cs) _)<-Set.toList gs', x==y]
    dcs = [[c| c@(a,D,b)<-Set.toList cs , a==b] | TT (_,_,cs) _<-cgs]
    compose gs = trace ("## "++show (Set.size gs)) $ foldr checkInsert gs $ do
      TT (x1,y1,cs1) l1 <- Set.toList gs
      TT (_,y2,cs2) l2 <- takeWhileTTfrom y1 . Set.toList $ setGT (TT
(y1,Al""(-1),Set.empty) []) gs
      return $ TT (x1,y2,cs1 `comp` cs2) (l1++y1:l2)
    takeWhileTTfrom y = takeWhile (\(TT (y',_,_) _) -> y==y')
    checkInsert x s
                    | Set.member x s = s
                    | otherwise      = Set.insert x s

data TT a b = TT a b deriving (Show)
instance (Eq a, Eq b) => Eq (TT a b) where
  (TT x lx) == (TT y ly) = lx==lx && ly==ly && x == y
instance (Ord a, Ord b) => Ord (TT a b) where
   (TT x lx) < (TT y ly) = lx==lx && ly==ly && x < y

The really intersting thing happens when I just make the Ord derived
the stack does not overflow and starts to print out the trace.
(It is not the result that I want though. My intention is to ignore the
tags in the set operation)

data TT a b = TT a b deriving (Show,Eq,Ord)

I believe my Eq and Ord instances are even more stricter than the
derived ones. Is there some magic in "deriving" that prevents
memory leak?

I've even followed the instance declaration that would be the
same as deriving but the that leaks memory.

data TT a b = TT a b deriving (Show)
instance (Eq a, Eq b) => Eq (TT a b) where
  (TT x lx) == (TT y ly) = x == y && lx == ly
instance (Ord a, Ord b) => Ord (TT a b) where
   (TT x lx) < (TT y ly) = x < y || x == y && lx < ly

This is really a panic.

-- 
Ahn, Ki Yung