[GHC] #9476: Implement late lambda-lifting
GHC
ghc-devs at haskell.org
Fri Jan 11 14:11:14 UTC 2019
#9476: Implement late lambda-lifting
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner: sgraf
Type: feature request | Status: closed
Priority: normal | Milestone: 8.8.1
Component: Compiler | Version: 7.8.2
Resolution: fixed | Keywords: LateLamLift
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #8763 #13286 | Differential Rev(s): Phab:D5224
Wiki Page: LateLamLift |
-------------------------------------+-------------------------------------
Comment (by sgraf):
OK, it turns out that `paraffins` regresses by 8% in counted instructions
when allowing arbitrary closure growth because of the very same function I
pinned down as the reason for the alleged speed-up in comment:56. It seems
like we don't want to lift the inner loop of such a list comprehension.
On the other hand, there's `gen_regexp` which improves by 2-5% in runtime
when we lift the next-to-inner function of a list comprehension (`go_s7zy`
below):
{{{
let {
go_s7zy =
sat-only \r [x1_s7zz]
case ># [x1_s7zz y_s7zx] of {
__DEFAULT ->
case chr# [x1_s7zz] of ds13_s7zB {
__DEFAULT ->
let { ds14_s7zC = CCCS C#! [ds13_s7zB]; } in
let { z_s7zD = \u [] case +# [x1_s7zz 1#] of
sat_s7zE { __DEFAULT -> go_s7zy sat_s7zE; }; } in
let {
go1_s7zF =
sat-only \r [ds15_s7zG]
case ds15_s7zG of {
[] -> z_s7zD;
: y1_s7zI ys_s7zJ ->
let { sat_s7zL = \u [] go1_s7zF
ys_s7zJ; } in
let { sat_s7zK = CCCS :! [ds14_s7zC
y1_s7zI]; } in : [sat_s7zK sat_s7zL];
};
} in go1_s7zF lvl13_s7zw;
};
1# -> [] [];
};
} in case ord# [c1_s7za] of sat_s7zM { __DEFAULT -> go_s7zy sat_s7zM; };
}}}
Lifting `go_s7zy` leads to a very slight increase in allocation (so would
be disallowed from the perspective of closure growth), but a decrease in
runtime. Maybe this is some low-level thing again, I don't know. The
difference in Cmm is as expected and I wouldn't have expected a speed-up.
So, to sum this up: Deactivating closure growth checks, thus allowing
arbitrary closure growth resulting from lambda lifting
- leads to an improvement of ~0.1% in runtime and counted instructions
(though the two don't necessarily coincide).
- also leads to unpredictable regressions in total allocations, like
+28.6% in `wheel-sieve1`. From the regressions I investigated
(`gen_regexp`, for example), the bindings the lifting decisions of which
are responsible for the biggest regression in allocations (+10% due to the
inner loop above, `go1_s7zF`) are not necessarily the same bindings which
contribute the speed-ups (`go_s7zy` above).
I'm not sure how to come up with a criterion that separates the two
classes (beneficial to lift vs. not) of bindings, so I suggest we stick to
the current closure growth heuristic. Which is a little more conservative
than necessary, but avoids arbitrary regressions in allocations on STG
level.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9476#comment:76>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list