[GHC] #15519: Minor code refactoring leads to drastic performance degradation
GHC
ghc-devs at haskell.org
Wed Aug 29 10:23:31 UTC 2018
#15519: Minor code refactoring leads to drastic performance degradation
-------------------------------------+-------------------------------------
Reporter: danilo2 | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.8.1
Component: Compiler | Version: 8.4.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by simonpj):
(NB: this comment does not respond to `test2`, which I'm still
investigating.)
I can see what is happening in `test0` vs `test1`.
In `test0` we get
{{{
lvl_sdWb :: Char -> Bool
[LclId, Arity=1]
lvl_sdWb
= \ (c_X402 :: Char) ->
case c_X402 of { GHC.Types.C# ipv_sbcB [Dmd=<S,U>] ->
case ipv_sbcB of {
__DEFAULT -> GHC.Types.False;
'e'# -> GHC.Types.True;
's'# -> GHC.Types.True;
't'# -> GHC.Types.True
} }
$wtest0_sdEs
= \ (ww_sdEo :: GHC.Prim.ByteArray#)
(ww_sdEp :: GHC.Prim.Int#)
(ww_sdEq :: GHC.Prim.Int#) ->
case $sunion_scrz lvl_scJr lvl_sdWa of dt_X4yi [Dmd=<S,U>]
{ __DEFAULT ->
case $sunion_scrz lvl_scJp dt_X4yi of dt_X4yq { __DEFAULT ->
$wrunTokenParser_sdDw
(Main.Many @ Char (Main.Tokens @ Char dt_X4yq lvl_sdWb))
ww_sdEo
ww_sdEp
ww_sdEq
}
}
}}}
So `SpecConstr` will specialise `$wrunTokenParser` thus:
{{{
RULES "SC:$wrunTokenParser1" [2]
forall (sc_se8A :: GHC.Prim.Int#)
(sc_se8z :: GHC.Prim.Int#)
(sc_se8y :: GHC.Prim.ByteArray#)
(sc_se8x :: Set Char).
$wrunTokenParser_sdDw (Main.Many
@ Char (Main.Tokens @ Char sc_se8x
lvl_sdWb))
sc_se8y
sc_se8z
sc_se8A
= $s$wrunTokenParser_se8G sc_se8A sc_se8z sc_se8y sc_se8x]
}}}
Notice, in particular, `lvl_sdWb`, which is a top-level constant (not
forall'd by the RULE): the specialised `runTokenParser` knows exactly what
that function is, and that makes the inner loop fast. The fact that it is
specialised for `Many` and `Tokens` is incidental, because
`runTokenParser` is not recursive; it's the loop inside (which comes from
`span`) that is rendered fast by knowing the function given to `span`.
In contrast, `test1` doesn't get any specialisation.
{{{
test1 = runTokenParser testGrammar1
}}}
Even if you manually eta-expand it, by writing
{{{
test3 src = runTokenParser testGrammar1 src
}}}
we still get nothing useful
{{{
Main.$wtest3
= \ (ww_sdIT :: GHC.Prim.ByteArray#)
(ww1_sdIU :: GHC.Prim.Int#)
(ww2_sdIV :: GHC.Prim.Int#) ->
$wrunTokenParser_resz testGrammar1_r23E ww_sdIT ww1_sdIU ww2_sdIV
}}}
What happened in `test0` (the fast case) is that the programmer manually
inlined `testGrammar1_r23E`:
{{{
testGrammar1_r23E
= case Main.$sunion lvl17_resE lvl19_resH of dt_X4B6 { __DEFAULT ->
case Main.$sunion lvl18_resG dt_X4B6 of dt1_X4Be { __DEFAULT ->
Main.Many @ Char (Main.Tokens @ Char dt1_X4Be lvl10_ress)
}
}
}}}
But GHC is super-cautious about doing so in `test3`, in case we duplicate
the work of computing `testGrammar1`:
Those `union`s might be expensive! And `test3` might be applied to many
different `src` arguments. In contrast, in `test1` you manually put the
grammar inside the `\src`.
---------------
'''Analysis'''
There are two problems:
1. In `test3`, GHC's caution about inlining `testGrammar1` is (in
general) reasonable. Perhaps the Right Thing is to ignore the problem of
work-duplication if the user supplies an INLINE pragma, which you do in
this case, presumably for that exact reason.
Let's see: danilo2, what led you to the INLINE pragma on
`testGrammar1`?
2. `test1` is not eta-expanded, because it's a partial application: see
`Note [Do not eta-expand PAPs]` in `SimplUtils`. And because it is not
eta-expanded, `runTokenParser` doesn't get enough arguments and
`SpecConstr` doesn't consider under-saturated calls.
Moreover, even `test0` is fragile: it's entirely possible that the full-
laziness pass will float out all those let-bindings (for `s1`, `s2` etc)
to top level, since they are independent of `src`
-------------
'''Workarounds'''
A robust improvement is to use `oneShot`:
{{{
test3 = oneShot (runTokenParser testGrammar1)
}}}
The magic `oneShot` function eta-expands its argument to a one-shot
lambda:
{{{
test3 = \src[one-shot] -> runTokenParser testGrammar1 src
}}}
Now at least, if `testGrammar1` is inlined, it won't get floated out
again.
Next thing: make this the ''only'' occurrence of `testGrammar1`; then
there is no duplication issue when inlining it, so we get
{{{
test3 = \src[one-shot] -> runTokenParser (...code for the grammar...) src
}}}
In the test case, `testGrammar1` is used in two different tests, so it
currently won't get inlined (lest we have work duplication); but in your
real code it is probably used only once, I guess.
Last thing: remove the INLINE pragma on `testGrammar1`. It may seem silly
but if a binding has an INLINE pragma, even if it is used exactly once it
is not inlined. Why not? See `Note [Stable unfoldings and
preInlineUnconditionally]`. This really is silly: see Fix 1 below.
--------------
'''Fixing it properly'''
* '''Fix 1''': concerning `Note [Stable unfoldings and
preInlineUnconditionally]`, perhaps we should not do this for arity-0
bindings.
* '''Fix 2''': perhaps we should honour INLINE pragmas on 0-arity
bindings, and simply inline them at every usage site regardless.
It's not so clear how to solve the eta-expansion problem. Ideas
* we could try to make `SpecConstr` deal with under-saturated calls;
and/or
* we could eta-expand PAPs provided no coercions were involved.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15519#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list