[Haskell-cafe] Re: Space usage problems

Simon Marlow simonmar at microsoft.com
Wed Jan 11 05:36:47 EST 2006


Ian Lynagh wrote:
> On Tue, Jan 10, 2006 at 04:44:33PM +0000, Ian Lynagh wrote:
> 
>>readChunks :: FirstMonad String
>>readChunks = do xs <- get
>>                if null xs then return []
>>                           else do let (ys, zs) = foo xs
>>                                   put zs
>>                                   rest <- readChunks
>>                                   return (ys ++ rest)
> 
> 
> It looks like changing this let to a case fixes this example, but at the
> time I'd experimented with that there must have been other issues
> clouding the effect, such as the following.
> 
> Foo1 (attached) uses large amounts of memory whereas Foo2 (also
> attached) runs in a little constant space. The difference is only
> changing this:
>
>     else do chunk <- case foo xs of
>                          (ys, zs) ->
>                              do put zs
>                                 return ys
>             chunks <- readChunks
>             return (chunk ++ chunks)
> 
> to this:
> 
>     else case foo xs of
>              (ys, zs) ->
>                  do put zs
>                     chunks <- readChunks
>                     return (ys ++ chunks)

I had great difficulty understanding this, but I think I do now.  It's a 
bit easier to understand if you inline the monads away.  Foo1 translates 
to this:

   bar []     = ([],[])
   bar (x:xs) = let (zs,ys) = bar xs in (x:zs,ys)

   readChunks [] = ([],[])
   readChunks xs = let (ys,zs) = bar xs
                       (chunks,rest) = readChunks zs in
		  (ys ++ chunks, rest)
and Foo2:

   readChunks [] = ([],[])
   readChunks xs = case bar xs of
                    (zs,ys) -> let (chunks,rest) = readChunks ys in
                               (zs ++ chunks, rest)


This is pretty much what GHC ends up with when you give -O (actually it 
turns some of the tuples into unboxed tuples, but that's not important).

We can see in Foo1 that chunks is a thunk holding on to zs, which is a 
thunk that holds on to xs, so you never get to release xs until the 
whole result list (ys) is traversed.  GHC's lazy tuple optimisation 
doesn't kick in, because neither chunks nor rest are evaluated.

However, it's not so clear why Foo2 is better.  chunks holds on to ys, 
the second of the pair returned by bar.  In fact, ys will point to a 
chain of thunks that looks like this:

   ys = snd (_, snd (_, snd (_, snd (_, snd ...))))

every time GC runs, it can completely eliminate this list via the 
well-known lazy tuple optimisation.  Unfortunately it doesn't 
*completely* eliminate the list, because of a shortcoming in our 
implementation, actually reported earlier by Ian Lynagh with a very 
similar program :-)  Fortunately in this example we do seem to be 
reducing enough of the list to eliminate the space leak, though.

My suggestion: don't use the lazy state monad if you can help it.

Cheers,
	Simon



More information about the Haskell-Cafe mailing list