Basic Block Layout in the NCG
Andreas Klebinger
klebinger.andreas at gmx.at
Sat May 5 19:23:47 UTC 2018
Does anyone have good hints for literature on basic block layout algorithms?
I've run into a few examples where the current algorithm falls apart
while working on Cmm.
There is a trac ticket https://ghc.haskell.org/trac/ghc/ticket/15124#ticket
where I tracked some of the issues I ran into.
As it stands some cmm optimizations are far out weighted by
accidental changes they cause in the layout of basic blocks.
The main problem seems to be that the current codegen only considers the
last jump
in a basic block as relevant for code layout.
This works well for linear chains of control flow but behaves badly and
somewhat
unpredictable when dealing with branch heavy code where blocks have more
than
one successor or calls.
In particular if we have a loop
A jmp B call C call D
which we enter into at block B from Block E
we would like something like:
E,B,C,D,A
Which means with some luck C/D might be still in cache if we return from
the call.
However we can currently get:
E,B,A,X,D,X,C
where X are other unrelated blocks. This happens since call edges are
invisible to the layout algorithm.
It even happens when we have (conditional) jumps from B to C and C to D
since these are invisible as well!
I came across cases where inverting conditions lead to big performance
losses since suddenly block layout
got all messed up. (~4% slowdown for the worst offenders).
So I'm looking for solutions there.
More information about the ghc-devs
mailing list