Why is there a space leak here?

Tue, 05 Jun 2001 17:03:24 -0400

Alastair David Reid wrote:
> 
> Executive summary: David's program has an incredibly subtle space leak
> in it (or I'm being incredibly dumb).  I encourage the honchos (and
> would be honchos) to have a look.  Users of other compilers might give
> it a shot too.
> 
> David Bakin <davidbak@cablespeed.com> writes:
> 
> > Why is there a space leak in foo1 but not in foo2?  (I.e., in Hugs
> > Nov '99) foo1 eats cells (and eventually runs out) where foo2
> > doesn't.  That is, if I do (length (foo1 1000000)) I eventually run
> > out of cells but (length (foo2 1000000)) runs fine (every GC returns
> > basically the same amount of space).  Something must be wrong in
> > flatten but it follows the pattern of many functions in the prelude
> > (which I'm trying to learn from).  I have been puzzling over this
> > for nearly a full day (getting this reduced version from my own code
> > which wasn't working).  In general, how can I either a) analyze code
> > looking for a space leak or b) experiment (e.g., using Hugs) to find
> > a space leak?  Thanks!  -- Dave
> 

I certainly don't have tons of experience tracking down and fixing 
space leaks.  But I can show you how one could attack the problem 
in a brute-force kind of way.  Let's use the following definitions
(from Alastair) to make things simpler:

flatten :: [[a]] -> [a]
flatten []       = []
flatten ([]:xss) = flatten xss
flatten ((x:xs):xss) = x : flatten (xs:xss)

-- This has a space leak, e.g., when reducing (length (foo2 1000000))
foo2 m
  = take m v
    where
      v = 1 : flatten (map double v)
      double x = [x,x]

-- This has no space leak, e.g., when reducing (length (foo3 1000000))
foo3 m
  = take m v
    where
      v = 1 : f1 (map single v)
      single x = [x]

We just start evaluating the program by hand and observe what's happening.
I'm being a little informal in my hand evaluations (and there's probably
a number of mistakes) but I think I've done well enough to visualize the
space behavior of these programs.

Regarding my derivations:
  * I'm using let's to simulate the sharing done by something like the STG
    machine.  If you're curious about why exactly evaluation is proceeding
    as it is, you might want to look at some of the work on "call by need 
    calculi".

  * I'm showing the evaluation steps using "p1 -> p2" to indicate that
    p1 reduces to p2.  Also, I "embed" evaluation steps into programs 
    as follows so as to save repetition (where C[] is any context):
       C[    p1
          -> p2
        ]
    which is the same as
      C[p1] -> C[p2]
    Hopefully the indentation will disambiguate things.

So, let's simulate the evaluation of (length $ foo3 100):

   length $ foo3 100
-> foldl' (\n _ -> n + 1) 0 $
       foo3 100
    -> take 100 v
    -> let v  = 1 : v2
           v2 = flatten (map single v)
       in take 100 (1:v2)
       -> 1 : take 99 v2
-> foldl' (\n _ -> n + 1) 1 $
       let v  = 1 : v2
           v2 = flatten (map single v)
             -> flatten (map single (1:v2))
             -> flatten ([1] : map single v2))
             -> 1 : flatten ([] : map single v2))
       in take 99 v2
    -> 
       let v  = 1 : v2
           v2 = 1 : v3
           v3 = flatten ([] : map single v2))
       in take 99 (1 : v3)
    ->                                                 {GC}
       let v2 = 1 : v3
           v3 = flatten ([] : map single v2))
       in take 99 (1 : v3)
       -> 1 : take 98 v3
-> foldl' (\n _ -> n + 1) 2 $
       let v2 = 1 : v3
           v3 = flatten ([] : map single v2))
             -> flatten (map single v2))
       in take 98 v3

So, there is no space leak here because at this point we have a program
which is the same as a previous program (up to variable naming and integer
values).  So the program isn't growing.

Note that for every foldl' reduction, there will be a GC (garbage collection)
step.

Now, let's simulate the evaluation of (length $ foo2 100):

   length $ foo2 100
-> foldl' (\n _ -> n + 1) 0 $
       foo2 100
    -> take 100 v
    -> let v  = 1 : v2
           v2 = flatten (map double v)
       in take 100 (1:v2)
       -> 1 : take 99 v2
-> foldl' (\n _ -> n + 1) 1 $
      let v  = 1 : v2
          v2 = flatten (map double v)
            -> flatten (map double (1:v2))
            -> flatten ([1,1] : map double v2)
            -> 1 : flatten ([1] : map double v2)
       in take 99 v2
    ->
      let v  = 1 : v2
          v2 = 1 : v3
          v3 = flatten ([1] : map double v2)
      in take 99 (1 : v3)
      -> 1 : take 98 v3
    ->                                             {GC}
      let v2 = 1 : v3
          v3 = flatten ([1] : map double v2)
      in take 99 (1 : v3)
      -> 1 : take 98 v3
-> foldl' (\n _ -> n + 1) 2 $
      let v2 = 1 : v3
          v3 = flatten ([1] : map double v2)
            -> 1 : flatten ([] : map double v2)
      in take 98 v3
   ->
      let v2 = 1 : v3
          v3 = 1 : v4
          v4 = flatten ([] : map double v2)
      in take 98 (1:v4)
      -> 1 : take 97 v4
-> foldl' (\n _ -> n + 1) 3 $
      let v2 = 1 : v3
          v3 = 1 : v4
          v4 = flatten ([] : map double v2)
            -> flatten (map double v2)
            -> flatten (map double (1:v3))
            -> flatten ([1,1] : map double v3)
            -> 1 : flatten ([1] : map double v3)
      in take 97 v4
   -> 
      let v2 = 1 : v3
          v3 = 1 : v4
          v4 = 1 : v5
          v5 = flatten ([1] : map double v3)
      in take 97 (1:v5)
   ->                                              {GC}
      let v3 = 1 : v4
          v4 = 1 : v5
          v5 = flatten ([1] : map double v3)
      in take 97 (1:v5)
      -> 1 : take 96 v5
-> foldl' (\n _ -> n + 1) 4 $
      let v3 = 1 : v4
          v4 = 1 : v5
          v5 = flatten ([1] : map double v3)
            -> 1 : flatten ([] : map double v3)
      in take 96 v5
   ->
      let v3 = 1 : v4
          v4 = 1 : v5
          v5 = 1 : v6
          v6 = flatten ([] : map double v3)
      in take 96 (1:v6)
      -> 1 : take 95 v6
-> foldl' (\n _ -> n + 1) 5 $
      let v3 = 1 : v4
          v4 = 1 : v5
          v5 = 1 : v6
          v6 = flatten ([] : map double v3)
      in take 95 v6

I've done a few more reduction steps here.  Note that for every two foldl'
reductions there is only one {GC} step!  So, the program is growing.
Thus, we have a space leak.

- Mark