Space leak question

Tue Jul 26 23:44:56 CEST 2011

Bertram Felgenhauer <bertram.felgenhauer <at> googlemail.com> writes:

> 
> Hi Will,
> > in reformulation of a code with no space leak, the leak reappeares.
> > 
> This is entirely expected. What is gobbling up the memory is the
> shared [5,7..] list that some invocation of g consume while others
> hang onto the head of the same list. (Note that g is a constant.)
> If you change the code to make g pointful, and compile with
> -fno-full-laziness, then the memory usage will go down again.
> 
>  {-# OPTIONS_GHC -O2 -fno-full-laziness #-}
>  primes :: [Int]
>  primes = 2 : g (fix g)
>   where
>    g xs = (3:) . ([5,7..] `minus`)
>                . foldi (\(x:xs) -> (x:) . union xs)
>                . map (\x-> [x*x, x*x+2*x..]) $ xs
> 
> With -ffull-laziness, ghc will notice that [5,7..] does not depend on
> xs, and float it out to the surrounding scope, essentially turning the
> code into
> 
>  {-# OPTIONS_GHC -O2 #-}
>  primes :: [Int]
>  primes = 2 : g (fix g)
>   where
>    g xs = (3:) . (odds `minus`)
>                . foldi (\(x:xs) -> (x:) . union xs)
>                . map (\x-> [x*x, x*x+2*x..]) $ xs
>    odds = [5,7..]
> 
> and the [5,7..] list will be shared once more.
> 
> Using -fno-cse has no effect in this example, because there are no
> duplicate subexpressions in the code. It is still a good idea to
> try this option when one encounters odd space leaks.
> 
> Bertram
> 

Hi Bertram,

thanks so much for your help!

I could only get rid of the leak, with your advice, using the switch
-fno-full-laziness, with pointful "g" code, and this (correction at the bottom!):

  ([5,7..] `minus`), 

or this 

  (odds () `minus`), 
    where  odds () = [5,7..] 

(notice the (), without it there's a huge leak), or with this

  (gaps 5) 
  where 
   gaps k s@(x:xs) = 
     if k<x then k:gaps (k+2) s else gaps (k+2) xs 

There was no way that I could find to achieve this without that switch.
(correction at the bottom!!!)

I would expect at least the (gaps 5) version to run with no leak, without that
switch, but no. Strange, isn't it? Adding bang arguments didn't help.

Sometimes adding the -fno-cse increased the leak hugely, at least 4 times more.

The test entries are on ideone, which uses ghc-6.8.2:

http://ideone.com/qpnqe   -- g (fix g)
http://ideone.com/wxmR5   -- twice-code

The code itself is the result of discussions on haskell-cafe some time 
back, btw. 

CORRECTION: just with "gaps" (but not the other ones), changing the "g" 
function from composed pieces into a "normal" code, it did it! (probably 
some ghc version-specific stuff at play):

  g xs = 3 : gaps 5
              ( foldi (\(x:xs) -> (x:) . union xs)
                [[x*x, x*x+2*x..] | x <- xs] )

This even runs ~ 9% faster than the twice-code version, at empirical complexity
of O(n^1.21..1.22), in "n" primes produced, for first few million primes. :) It
is practically the same as for the priority queue-based code, for which it is
about O(n^1.20..1.21) empirically, BTW.

Thanks!