[GHC] #9476: Implement late lambda-lifting

Fri Sep 7 09:08:55 UTC 2018

#9476: Implement late lambda-lifting
-------------------------------------+-------------------------------------
        Reporter:  simonpj           |                Owner:  sgraf
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.8.2
      Resolution:                    |             Keywords:  LateLamLift
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #8763 #13286      |  Differential Rev(s):
       Wiki Page:  LateLamLift       |
-------------------------------------+-------------------------------------

Comment (by sgraf):

 Replying to [comment:28 simonpj]:
 > > -fstg-lift-lams-known: Allow turning known into unknown calls.
 >
 > In your example, you say that we might float `f` but not `g`.  But why
 don't we float `g`?

 Because at the time we handled `g`, it (hypothetically) was deemed non-
 beneficial to lift. It might lead to more allocation or hurt some other
 heuristic.

 Of course one could argue that lifting ''both'' bindings could be a win.
 Our decision strategy is greedy in that regard; There's no backtracking
 involved. We transform expressions going from outer to inner, at each step
 taking into account the lifting decisions we already made (and which we
 don't question or undo). So, for that example in particular, the
 assumption was that we examined `g`, decided not to lift it and then look
 at `f` knowing that `g` won't be lifted.

 > > lifting go to top-level will result in abstracting over `$warg`,
 >
 > Why didn't we float `$warg`?

 Same reasoning. Because we decided earlier that the lift might not be
 worthwhile. I could look at some debug output to see what was the
 reason(s) for `$warg` if you are really interested.

 Replying to [comment:29 simonpj]:
 > > Ah, but one of them (the last, in particular) is a 'void' parameter
 (so, slow call pattern pppppv), which is completely erased from the
 resulting Cmm!
 >
 > OK, so the conclusion is: don't count void args when counting args and
 comparing with the threshold?  I agree!

 Yes, exactly. See
 https://github.com/sgraf812/ghc/blob/9b9260c1d45d127edf9ebdfe04a3daaff24a9dea/compiler/simplStg/StgLiftLams/Analysis.hs#L310

 Replying to [comment:30 simonpj]:
 > The non-recursive case
 >
 > So it's a bit surprising to me that even with a threshold of 10 we have
 > such a small hit on instruction count.

 I was wondering the same thing. I ''think'' the other heuristics
 (https://github.com/sgraf812/ghc/blob/9b9260c1d45d127edf9ebdfe04a3daaff24a9dea/compiler/simplStg/StgLiftLams/Analysis.hs#L228-L235,
 for completeness) fire way too often to allow non-recursive calls with 20
 arguments. Consider the impact on allocations: When we lift such a
 function, all 20ish free variables would have to be made available at all
 call sites, leading to a massive increase in closure allocation for
 closures around the actual call sites. Example

 {{{
 let f = [x1...x20] \[a b c] -> ...
     g = [f y] \z -> f y y z
 in map g [1,2,3]
 }}}

 Assuming for the sake of the example that `g` wouldn't be lifted to top-
 level itself (it's just a PAP of `f`), lifting `f` would just shift the
 whole closure into `g`'s closure. In general, there could be arbitrary
 more closures between `f`s binding site and its call sites, each of which
 would have to carry `f`s former free variables, should we decide to lift
 it.

 I could try and measure with the allocation heuristic and the max args
 heuristic turned off, to see if we get more chaotic results.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9476#comment:33>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler