[GHC] #12808: For closures, Loop Invariant Code Flow related to captured free values not lifted outside the loop...
GHC
ghc-devs at haskell.org
Wed Jan 11 12:18:03 UTC 2017
#12808: For closures, Loop Invariant Code Flow related to captured free values not
lifted outside the loop...
-------------------------------------+-------------------------------------
Reporter: GordonBGood | Owner:
Type: bug | Status: new
Priority: normal | Milestone: 8.2.1
Component: Compiler | Version: 8.0.1
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by GordonBGood):
Replying to [comment:19 simonpj]:
> > As I have stated, I believe there are many use cases, even to the
common worker/wrapper pattern that seeks to reduce the amount of parameter
passing by enclosing a recursive worker closure inside a wrapper; any
gains from this pattern could be cancelled and more if the wrapped
enclosure is less efficient.
>
> Let me say what I ''think'' you are saying. Consider
> {{{
> f x p q = let g y = ...g y'...x...
> in (g p, g q)
> }}}
> Left as-is we allocate a closure for `g` every time we call `f`. But
instead we could lambda-lift `g`:
> {{{
> g2 x y = ...g2 x y'...x...
>
> f x p q = (g2 x p, g2 x q)
> }}}
> Now we don't allocate a closure for `g`. That is good. Is this what
you mean?
>
> We don't want to do this in general, early in optimisation, because we
get huge benefits from being able to "see" the binding site of `g`'s free
variable `x`. But these benefits are over when it comes to code
generation.
>
> So we have experimented with so-called "late lambda lifting" (LLF).
There's a whole wiki page about it: [wiki:LateLamLift]. It can be a real
win.
>
> One obstacle to LLF is, ironically, that it can destroy join points (see
the wiki page). A second benefit of Luke's new join-point work is that it
becomes much easier to ensure that LLF doesn't destroy join points, and
thus renders it much more practical. I think Luke will turn his attention
to it once join points are solidly in.
Yes, Simon, I manually lifted the closure by turning all the otherwise
free variables into arguments to the function; although I didn't actually
move the code to the top level it could have been, as it is no longer a
closure capturing free variables but rather just a free function.
However, my point in doing so was not to show that even this early manual
lambda lifting is effective but that the code generated inside the
function became so much more efficient due to seeing the Loop Invariant
Code Flow (LICF) and lifting the register loading outside of the loop; in
my mind the compiler should have done this whether the code was a closure
or not.
As the article on lambda lifting says, this lambda lifting at an early
stage can prevent some optimizations in some cases, and definitely there
will be some cost in this case in run time overhead in passing all of
those extra arguments to the function each and every time the function is
called; however, as the recursive calls are eliminated by the tail call
optimization of making a loop internal to the function, the function only
gets called very few times so as to have a negligible overall impact here.
There may be other negative effects of the very early lambda lifting, but
again they are negligible compared to the gains made in efficiency of the
internal loop due to properly using LICF lifting.
So the question I pose is "Why isn't LICF lifting used for the code
internal to closures when it is used for non-closures?".
Your Lambda Lifting and Join Point Analysis can serve to reduce this
problem by eliminating closures, but the problem is still there for cases
where the closures can't be eliminated and/or shouldn't be lifted.
I sometimes wonder whether this is the problem that makes LL and JPA
appear to be so effective: this problem makes the code that doesn't use
JPA/LL much slower than it would otherwise be so that if this problem were
not there, the gains made from LL/JPA in eliminating some/many closures
would not likely be so great.
I brought up the wrapper/worker pattern because its whole point is to
reduce the number of passed parameters that need to be tail call optimized
away, but the "worker" then must needs be a closure; with the problem of
this issue, in many cases the resulting wrapper/worker will be slower than
if we didn't factor in the closure "worker". In the sieve code of this
thread, the "nxtc" closure as originally written is essentially a "worker"
to the "nxtp" wrapper function and manually lifting it as I did means it
is no longer a "worker". I should have gotten slightly worse performance
in doing this but instead got much better performance because the compiler
now did LICF optimization on the non-closure, where it currently doesn't
seem to be able to do it on a non-closure.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12808#comment:20>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list