[Haskell-cafe] Computing lazy and strict list operations at the same time

Fri Jul 14 16:10:02 EDT 2006

On Mon, Jun 19, 2006 at 05:50:13PM +0100, Duncan Coutts wrote:
> On Mon, 2006-06-19 at 17:03 +0100, Jon Fairbairn wrote:
> > il [] = error "foo"
> > il [x] = ([], x)
> > il (x:xs) = cof x (il xs)
> >             where cof x ~(a,b) = (x:a, b)
> > --                      !
> 
> From a quick test, it looks like none of our suggested solutions
> actually work in constant space.
> 
> main = interact $ \s ->
>   case il s of
>     (xs, x) -> let l = length xs
>                in l `seq` show (l,x)

I was hoping to have enlightenment served to me, but since nobody has
explained this, I took a crack at it.  I still can't explain it, but I
got some data that maybe somebody else will understand.  My code:

    initlast :: [a] -> ([a], a)
    initlast [x]    = ([], x)
    initlast (x:xs) = let (init, last) = initlast xs
                      in  (x:init, {-# SCC "last" #-} last)

    lenshow n (_:xs) last = let n1 = n+1 in n1 `seq` lenshow n1 xs last
    lenshow n [] last = show (n,last)

    main = interact $ \s -> case initlast s of
      (xs, x) -> lenshow 0 xs x 

lenshow is just "show (length xs, x)", written out so I can tweak it
later.  This exhibits the runaway space usage with a large input that
Duncan described.  If you throw away "last" in lenshow and just "show
n", it runs in constant space.

It seems that the reference to "last" that I annotated as a cost center
is holding a chain of trivial thunks--trivial because "last" is just
being copied from the result of the recursive call to initlast.  I
thought maybe I could get rid of them by returning an unboxed tuple from
initlast, but this turned out to make no difference.

Profiling gave a couple hints.  Retainer set profiling (-hr) showed the
retainer set holding all the memory was

    {<Main.last,Main.initlast,Main.main,Main.CAF>}

I think this confirms that last holding a chain of thunks.  I'm still
surprised that ghc doesn't see that they're trivial.  It feels like it
should be an easy optimization.

Constructor and type profiling (-hd and -hy) both show the memory held
by stg_ap_1_upd_info.  I don't know what that means.

Most frustrating, I can't find any work around:  No matter how I tried
to write initlast, it had the same leak when consumed this way.  (NB:
functional implementations only need apply.)  Granted, I can't think of
any good reason to code in this style, but it's hard for me to accept
that it should be impossible.

Finally, here is a (silly) version that doesn't leak:

    initlast :: [a] -> ([a], [a])
    initlast [x]    = ([], [x])
    initlast (x:xs) = let (init, last) = initlast xs
                      in  (x:init, undefined:last)

    lenshow n (_:xs) (_:ls) = let n1 = n+1 in n1 `seq` lenshow n1 xs ls
    lenshow n [] [last] = show (n,last)

This is the first case I recall in which adding more constructors makes
a space leak go away, because it gives me something to force (vis
"_:ls).  With the original implementation, there was no way to "partly
force" last, one thunk at a time.  Have others used this technique?

Andrew