[GHC] #13851: Change in specialisation(?) behaviour since 8.0.2 causes 6x slowdown
ghc-devs at haskell.org
Thu Jun 22 15:37:49 UTC 2017
#13851: Change in specialisation(?) behaviour since 8.0.2 causes 6x slowdown
Reporter: mpickering | Owner: (none)
Type: bug | Status: new
Priority: high | Milestone: 8.2.1
Component: Compiler | Version: 8.2.1-rc2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
Comment (by simonpj):
Here is what is happening
* Before float-out we have
$stest1mtl = \eta. ...foldr (\x k z. blah) z e...
Since the first arg of the foldr has no free vars, we float it out to
lvl = \x y z. blah
$stest1mtl = \eta. ...foldr lvl z e...
* That makes `$stest1mtl` small, so it is inlined at its two call sites
(the first two test case in `main`).
* So now there are two calls to `lvl`, and it is quite big, so it doesn't
get inlined.
* But actually it is much better ''not'' to inline `$stest1mtl`, and
instead (after the foldr/build stuff has happened) to inline `lvl` back
into it.
This kind of thing not new; I trip over it quite often. Generally, given
f = e
g = ...f..
h = ...g...g..f...
should we inline `f` into `g`, thereby making `g` big, so it doesn't
inline into `h`? Or should we instead inline `g` into `h`? Sometimes one
is better, sometimes the other; I don't know any systematic way of doing
The Right Thing all the time. It turned out that the early-inline patch
changed the choice, which resulted in the changed performance.
However I did spot several things worth trying out
* In `CoreArity.rhsEtaExpandArity` we carefully do not eta-expand thunks.
But I saw some thunks like
= case z_a4NJ of wild_a4OF { GHC.Types.I# x1_a4OH ->
case x_a4NH of wild1_a4OJ { GHC.Types.I# y1_a4OL ->
case GHC.Prim.<=# x1_a4OH y1_a4OL of {
__DEFAULT -> (\ _ (eta_B1 :: Int) -> (wild_a4OF, eta_B1))
1# -> (\ _ (eta_B1 :: Int) -> (wild1_a4OJ, eta_B1))
Here it really would be good to eta-expand; then that particular `lvl`
could be inlined at its call sites. Here's a change to
`CoreArity.rhsEtaExpandArity` that did the job:
- | isOneShotInfo os || has_lam e -> 1 + length oss
+ | isOneShotInfo os || not (is_app e) -> 1 + length oss
- has_lam (Tick _ e) = has_lam e
- has_lam (Lam b e) = isId b || has_lam e
- has_lam _ = False
+ is_app (Tick _ e) = is_app e
+ is_app (App f _) = is_app f
+ is_app (Var _) = True
+ is_app _ = False
Worth trying.
* Now the offending top-level `lvl` function is still not inlined; but it
has a function argument that is applied, so teh call sites look like
lvl ... (\ab. blah) ...
When considering inining we do get a discount for the application of the
argument inside `lvl`'s rhs, but it was only a discout of 60, which seems
small considering how great it is to inline a function. Boosting it to
150 with `-funfolding-fun-discount=150` make the function inline, and we
get good code all round. Maybe we should just up the default.
* All the trouble is caused by the early float-out. I think we could try
just elminating it.
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/13851#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list