[GHC] #9476: Implement late lambda-lifting
GHC
ghc-devs at haskell.org
Fri Nov 30 12:01:05 UTC 2018
#9476: Implement late lambda-lifting
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner: sgraf
Type: feature request | Status: closed
Priority: normal | Milestone: 8.8.1
Component: Compiler | Version: 7.8.2
Resolution: fixed | Keywords: LateLamLift
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #8763 #13286 | Differential Rev(s): Phab:D5224
Wiki Page: LateLamLift |
-------------------------------------+-------------------------------------
Comment (by sgraf):
Thanks for pointing me to `-G1`, very interesting! The difference in bytes
copied and consequently runtime is even more grave:
{{{
$ ./default 19 +RTS -s -G1 > /dev/null
359,455,256 bytes allocated in the heap
334,966,000 bytes copied during GC
188,250,032 bytes maximum residency (9 sample(s))
2,125,824 bytes maximum slop
179 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max
pause
Gen 0 9 colls, 0 par 0.431s 0.431s 0.0478s
0.2337s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.123s ( 0.123s elapsed)
GC time 0.431s ( 0.431s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.554s ( 0.554s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 2,928,555,183 bytes per MUT second
Productivity 22.2% of total user, 22.2% of total elapsed
$ ./allow-cg 19 +RTS -s -G1 > /dev/null
401,712,312 bytes allocated in the heap
185,583,392 bytes copied during GC
97,712,192 bytes maximum residency (38 sample(s))
1,275,840 bytes maximum slop
93 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max
pause
Gen 0 38 colls, 0 par 0.221s 0.221s 0.0058s
0.1098s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.104s ( 0.104s elapsed)
GC time 0.221s ( 0.221s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.325s ( 0.325s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 3,878,228,290 bytes per MUT second
Productivity 31.9% of total user, 31.9% of total elapsed
}}}
The residency was cut in half! Also note the difference in number of
collections and that MUT is lower than the baseline (that doesn't lift the
`go` function above). Probably a caching side-effect of the smaller
residency, as the situation is still the same with `-A400M`, where the
baseline is faster.
I don't know how, but I suspect that lifting `go` causes the GC to be less
conservative about liveness of some closure objects. If I had to guess,
then something keeps the closure of `go` longer alive than the growing
`sat2` thunk in `go`. I played around with heap/retainer profiling, but to
no avail yet.
Here is a gist with the `-S -G1` output:
https://gist.github.com/sgraf812/5fbcf6b81fdd7c8af1a6060832bbfa11
There are two interesting things to point out:
1. The lifted version collects much more often, but only after completing
the computing intensive work. Not sure why there are so many of them,
seems redundant
2. Compared to the baseline, the residency (and consequently the total
heap size, it seems) grows slower, but the increase in total bytes
allocated leads to an additional collection before everything drops to
constant space.
Not sure what to make of that data, but it doesn't contradict what I said
about the closure of `go` being kept alive longer than `sat2`.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9476#comment:60>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list