[GHC] #9476: Implement late lambda-lifting
GHC
ghc-devs at haskell.org
Wed Jul 11 16:25:01 UTC 2018
#9476: Implement late lambda-lifting
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner: nfrisby
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #8763 | Differential Rev(s):
Wiki Page: LateLamLift |
-------------------------------------+-------------------------------------
Comment (by sgraf):
Replying to [comment:16 simonpj]:
> Thoughts
>
> * There are a handful of spectacular reductions in allocation (queens,
n-body). It'd be good to understand and explain them. Perhaps we can
more closely target LLF on those cases.
It's hard to tell for `n-body`: The `-fno-llf` variant shows the same
reduction in allocations, meaning that the reduction must happen somewhere
in the base libraries rather than in `n-body` itself.
`queens` has its `go1` function within `gen` lifted (arity 1 with 3 free
vars). Non-lifted Core after CorePrep:
{{{
let {
n_s5S3 [Occ=OnceL*] :: [[GHC.Types.Int]]
[LclId]
n_s5S3 = go_s5RY ys_s5S2 } in
case GHC.Prim.># 1# ww1_s5RX of {
__DEFAULT ->
letrec {
go1_s5S5 [Occ=LoopBreaker]
:: GHC.Prim.Int# -> [[GHC.Types.Int]]
[LclId, Arity=1, Str=<S,U>, Unf=OtherCon []]
go1_s5S5
= \ (x1_s5S6 :: GHC.Prim.Int#) ->
case Main.main_$ssafe y_s5S1 1# x1_s5S6 of {
GHC.Types.False ->
case GHC.Prim.==# x1_s5S6 ww1_s5RX of {
__DEFAULT ->
case GHC.Prim.+# x1_s5S6 1#
of sat_s5S9 [Occ=Once]
{ __DEFAULT ->
go1_s5S5 sat_s5S9
};
1# -> n_s5S3
};
GHC.Types.True ->
let {
sat_s5Se [Occ=Once] :: [[GHC.Types.Int]]
[LclId]
sat_s5Se
= case GHC.Prim.==# x1_s5S6 ww1_s5RX
of {
__DEFAULT ->
case GHC.Prim.+# x1_s5S6 1#
of sat_s5Sd [Occ=Once]
{ __DEFAULT ->
go1_s5S5 sat_s5Sd
};
1# -> n_s5S3
} } in
let {
sat_s5Sa [Occ=Once] :: GHC.Types.Int
[LclId]
sat_s5Sa = GHC.Types.I# x1_s5S6 } in
let {
sat_s5Sb [Occ=Once] :: [GHC.Types.Int]
[LclId]
sat_s5Sb
= GHC.Types.:
@ GHC.Types.Int sat_s5Sa y_s5S1 } in
GHC.Types.:
@ [GHC.Types.Int] sat_s5Sb sat_s5Se
}; } in
go1_s5S5 1#;
1# -> n_s5S3
}
}}}
And when `go1` is lifted to top-level, the CorePrep'd call site changes to
{{{
case GHC.Prim.># 1# ww1_s5Ts of {
__DEFAULT ->
let {
sat_s5Tz [Occ=Once, Dmd=<L,1*U>] :: [[GHC.Types.Int]]
[LclId]
sat_s5Tz = go_s5Tt ys_s5Tx } in
llf_go_r5SK y_s5Tw ww1_s5Ts sat_s5Tz 1#;
1# -> go_s5Tt ys_s5Tx
}
}}}
The important detail is that `n_s5S3` corresponds to `sat_s5Tz` in the
lifted version. Both have a 'single' occurrence, but `n_s5S3`'s occurrence
is under a non-oneshot lambda (which appearently and understandably is
enough for OccAnal to give up, as opposed to analyzing parameters of
recursive functions), so `n_s5S3` is updateable and causes an increase in
heap allocation, while `sat_s5Tz` is single-entry.
So, I'm afraid this gain can't entirely claimed by LLF. A late demand
analysis run would probably detect the same. There's `cichelli`, which
shows similar improvements (-10% allocs) due to something happening in
`Prog.hs`, but there's too many lifts to bisect, so I'll postpone that to
a follow-up post.
> * I don't think we should float join points at all, recursive or non-
recursive. Think of them like labels in a control-flow graph.
This was also what I thought, but there can be synergetic effects with the
inliner if we manage to "outline" (that's what the transformation would be
called in an imperative language) huge non-recursive join points, where
the call overhead is neglibigle. At least that's what the wiki page
mentions. But that brings me to the next point...
> * I think of LLF as a code-generation strategy, that we do once all
other transformations are done. (Lambda-lifting ''can'' affect earlier
optimisations. It can make a big function into a small one (by floating
out its guts), and thereby let it be inlined. But that is subtle and
difficult to get consistent gains for. Let's not complicate LLF by
thinking about this.)
Yes, I wasn't sure if having it play nice with the simplifier was intended
here. I take comfort in your confirmation seeing it as a code-gen
strategy.
> * Given that it's a code-gen strategy, doing it on STG makes perfect
sense to me. You've outlined the pros and cons well. Definitely worth a
try.
Will do.
> I'm not sure what you meant by "It's not enough to look at Core alone to
gauge allocation" as a disadvantage.
Maybe it's just me only slowly understanding every layer of compilation in
GHC, but I felt like I could have this mental model of roughly where and
how much things allocate after CorePrep, but that's probably an illusion
considering things like let-no-escapes (that story got better with join
points, I suppose). Having another pass transforming heap allocation into
stack allocation *after* CorePrep isn't exactly helping that intuition.
Or were you thrown off by me using words I maybe mistranslated ("gauge"
for "estimate", roughly)?
> When you say "Much less involved analysis that doesn't need to stay in
sync with CorePrep", I think it would be v helpful to lay out "the
analysis". I have vague memories, but I don't know what this "stay in
sync" stuff is about.
There is
[https://github.com/sgraf812/ghc/blob/f5c160c98830fdba83faa9a0634eeab38dbe04d4/compiler/simplCore/SetLevels.hs#L2169-L2214
this note] that explains why it's necessary to approximate CorePrep. What
follows is
[https://github.com/sgraf812/ghc/blob/f5c160c98830fdba83faa9a0634eeab38dbe04d4/compiler/simplCore/SetLevels.hs#L2478
this new analysis] that interleaves a free variable analysis with
computing use information for id's which are free in the floating
bindings.
> If you do it in STG you don't need to explain "stay in sync", but
explaining the analysis would be excellent.
Yes! I'm not really sure I understand all of it yet. (Re-)implementing it
on STG should help me figure things out.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9476#comment:17>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list