[GHC] #9476: Implement late lambda-lifting
GHC
ghc-devs at haskell.org
Fri Nov 30 14:47:13 UTC 2018
#9476: Implement late lambda-lifting
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner: sgraf
Type: feature request | Status: closed
Priority: normal | Milestone: 8.8.1
Component: Compiler | Version: 7.8.2
Resolution: fixed | Keywords: LateLamLift
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #8763 #13286 | Differential Rev(s): Phab:D5224
Wiki Page: LateLamLift |
-------------------------------------+-------------------------------------
Comment (by sgraf):
It seems this whole fuzz was about nothing. Obvious in hind-sight, I
wasn't aware that the maximum residency and bytes copied are just based on
sampling when the GC runs, which in turn depends on heap size. Although I
played around with `-A` before, I just ran a script which would find the
maximum residency over multiple different heap sizes, to get these
numbers:
{{{
$ ./default 19 +RTS -s -G1 -A56M
359,289,696 bytes allocated in the heap
313,229,544 bytes copied during GC
192,670,160 bytes maximum residency (4 sample(s))
2,757,856 bytes maximum slop
183 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max
pause
Gen 0 4 colls, 0 par 0.405s 0.405s 0.1013s
0.2541s
INIT time 0.001s ( 0.001s elapsed)
MUT time 0.119s ( 0.119s elapsed)
GC time 0.405s ( 0.405s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.524s ( 0.525s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 3,031,050,062 bytes per MUT second
Productivity 22.6% of total user, 22.6% of total elapsed
$ ./allow-cg 19 +RTS -s -G1 -A76M
401,485,600 bytes allocated in the heap
331,564,512 bytes copied during GC
196,161,944 bytes maximum residency (4 sample(s))
1,389,968 bytes maximum slop
187 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max
pause
Gen 0 4 colls, 0 par 0.436s 0.441s 0.1102s
0.2637s
INIT time 0.001s ( 0.001s elapsed)
MUT time 0.132s ( 0.132s elapsed)
GC time 0.436s ( 0.441s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.569s ( 0.574s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 3,049,743,734 bytes per MUT second
Productivity 23.1% of total user, 23.0% of total elapsed
}}}
Still, the impact of GC parameters is very annoying. Also, when I vary
`-A$iM`, where `$i` is an integer between 1 and 200, in 123 of 200 cases,
the baseline has higher maximum residency than when we also decide to lift
`go`. It seems that the GC parameterisation for the baseline is just in a
bad (bitter?) spot.
The question is, how do I sell this in the paper? I guess I could increase
nursery size even further (I'm currently benchmarking with `-A128M -H1G`),
but that's not very realistic, either...
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9476#comment:65>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list