[Haskell-cafe] How can we detect and fix memory leak due to lazyness?

Tue Aug 8 06:13:44 EDT 2006

Ahn, Ki Yung wrote:
> Recently, I'm facing the dark side of laziness
> -- the memory leak because of laziness.
> 
> Are there standardized approaches for detecting and fixing
> these kind of problems?

Not really.  As Don S. already said, try heap profiling.  The function
that is too lazy will show up as producer.  Other than that, you'll just
have to learn to look for the typical patterns.  Understanding Haskell's
evaluation model and being able to simulate it in your head also helps.

> sctAnal gs = null cgs || all (not . null) dcs
> where
>   gs' = fixSize compose $ Set.fromList [TT (x,y,cs) [] | To _ x y
> cs<-Set.toList gs]
>   cgs = [z | z@(TT (x,y,cs) _)<-Set.toList gs', x==y]
>   dcs = [[c| c@(a,D,b)<-Set.toList cs , a==b] | TT (_,_,cs) _<-cgs]
>   compose gs = trace ("## "++show (Set.size gs)) $ foldr checkInsert gs $ do
                                                     ^^^^^ point 1
>     TT (x1,y1,cs1) l1 <- Set.toList gs
>     TT (_,y2,cs2) l2 <- takeWhileTTfrom y1 . Set.toList $ setGT (TT
> (y1,Al""(-1),Set.empty) []) gs
>     return $ TT (x1,y2,cs1 `comp` cs2) (l1++y1:l2)
                                         ^^^^^^^^^^^ point 2
>   takeWhileTTfrom y = takeWhile (\(TT (y',_,_) _) -> y==y')
>   checkInsert x s
>                   | Set.member x s = s
>                   | otherwise      = Set.insert x s

I can see two sources of problems.  Point 2 seems to be the cause of
your immediate problem:  this builds nested applications of (++) and
never evaluates them.  If the result is demanded, (++) calls itself
recursively, and if the list is too long, the stack gets exhausted.
'seq' doesn't help, that would only let the (++) accumulate in the
list's tail, but 'foldr seq' should help, and so would deepSeq.  I
wonder why

> instance (Ord a, Ord b) => Ord (TT a b) where
>  (TT x lx) < (TT y ly) = lx==lx && ly==ly && x < y

doesn't.  Does the (lx == lx) get optimized away?  The easiest solution
would be to use a data structure that directly supports concatenation.
Any implementation of a deque is good (FingerTrees?  Having them around
can never hurt...) and so is a function.  Replace the list [a] by a
function ([a] -> [a]), replace [] by id and replace (l1++y1:l2) by
(l1.(y1:).l2).  Also helps with the quadratic runtime, btw.

At point 1i, there lurks another problem.  You may find that some graphs
will blow your stack or even your heap.  That's because the repeated
application of checkInsert is not evaluated and this thunk may get too
deep or need more space than the Set it would buils.  I think, you want
foldl' (note the prime) here.

Udo.
-- 
F:	Was ist ansteckend und kommutiert?
A:	Eine Abelsche Grippe.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.haskell.org//pipermail/haskell-cafe/attachments/20060808/428892bd/attachment-0001.bin