[Haskell-cafe] Thunks and GHC pessimisation

Tom Ellis tom-lists-haskell-cafe-2013 at jaguarpaw.co.uk
Tue Feb 26 22:13:20 CET 2013


On Tue, Feb 26, 2013 at 10:00:32AM -0000, oleg at okmij.org wrote:
> Tom Ellis wrote:
> > To avoid retaining a large lazy data structure in memory it is useful to
> > hide it behind a function call.  Below, "many" is used twice.  It is hidden
> > behind a function call so it can be garbage collected between uses. 
> 
> As you discovered, it is quite challenging to ``go against the grain''
> and force recomputation. GHC is quite good at avoiding
> recomputation. This is a trade-off, of time vs space. For large
> search tree, it is space that is a premium, and laziness and similar
> strategies are exactly the wrong trade-off. 

Hi Oleg,

I have good news: Don's suggestion of -fno-full-laziness that fixed the
space leak in my toy example also fixes the space leak in your iterative
deepening code.  To be precise, there is no longer any space leak in your
second implementation where subtrees are hidden behind a function call.

With full laziness your second implementation consumes 99M.  To avoid
sharing hacks are required and your third implementation consumes 2M. 
However, without full laziness your second implementation only consumes 2M.

> The solution (which I've seen in some of the internal library code) is
> to confuse GHC with extra functions:
>         http://okmij.org/ftp/Haskell/misc.html#memo-off

Yes, when I read your exposition several months ago it started the thinking
process which culminated in this thread.

> So, eventually it is possible to force recomputation. But the solution
> leaves a poor taste -- fighting a compiler is never a good idea.

I agree, but I'm glad to discover that disabling full laziness is enough to
avoid having to fight the compiler.

> So, this is a bug of sort -- not the bug of GHC, but of lazy evaluation. 
> Lazy evaluation is not the best evaluation strategy.  It is a trade-off,
> which suits a large class of problems and punishes another large class of
> problems.

I'm not as pessimistic as you on this issue.  If the results of function
calls may be shared with previous callers then the situation is problematic. 
However, if it is known that function calls return new thunks then the
programmer has a lot of power to avoid space leaks.

My conclusion is that this is not the bug of lazy evalution.  I would prefer
GHC to be more discerning about when it applies full laziness.  It is
unfortunate that it will apply it in cases where unbounded amounts of
additional memory usage may result.

Tom


% wget --quiet http://okmij.org/ftp/Haskell/STrees.hs

% ghc -O2 -fforce-recomp -rtsopts -main-is STrees.main2 STrees.hs && ./STrees iter 100 +RTS -tstderr > /dev/null
[1 of 1] Compiling STrees           ( STrees.hs, STrees.o )
Linking STrees ...
<<ghc: 159988256 bytes, 308 GCs, 12946822/42139812 avg/max bytes residency (10 samples), 99M in use, 0.00 INIT (0.00 elapsed), 8.40 MUT (8.40 elapsed), 0.95 GC (0.95 elapsed) :ghc>>

% ghc -O2 -fforce-recomp -rtsopts -main-is STrees.main3 STrees.hs && ./STrees iter 100 +RTS -tstderr > /dev/null
[1 of 1] Compiling STrees           ( STrees.hs, STrees.o )
Linking STrees ...
<<ghc: 6488675656 bytes, 12415 GCs, 214291/384036 avg/max bytes residency (37 samples), 2M in use, 0.00 INIT (0.00 elapsed), 8.64 MUT (8.64 elapsed), 0.32 GC (0.32 elapsed) :ghc>>

% ghc -O2 -fno-full-laziness -fforce-recomp -rtsopts -main-is STrees.main2 STrees.hs && ./STrees iter 100 +RTS -tstderr > /dev/null             
[1 of 1] Compiling STrees           ( STrees.hs, STrees.o )
Linking STrees ...
<<ghc: 11235944260 bytes, 21510 GCs, 208262/456880 avg/max bytes residency (44 samples), 2M in use, 0.00 INIT (0.00 elapsed), 8.03 MUT (8.03 elapsed), 0.52 GC (0.52 elapsed) :ghc>>



More information about the Haskell-Cafe mailing list