[GHC] #15519: Minor code refactoring leads to drastic performance degradation

GHC ghc-devs at haskell.org
Wed Aug 29 10:23:31 UTC 2018


#15519: Minor code refactoring leads to drastic performance degradation
-------------------------------------+-------------------------------------
        Reporter:  danilo2           |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  highest           |            Milestone:  8.8.1
       Component:  Compiler          |              Version:  8.4.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonpj):

 (NB: this comment does not respond to `test2`, which I'm still
 investigating.)

 I can see what is happening in `test0` vs `test1`.

 In `test0` we get
 {{{
 lvl_sdWb :: Char -> Bool
 [LclId, Arity=1]
 lvl_sdWb
   = \ (c_X402 :: Char) ->
       case c_X402 of { GHC.Types.C# ipv_sbcB [Dmd=<S,U>] ->
       case ipv_sbcB of {
         __DEFAULT -> GHC.Types.False;
         'e'# -> GHC.Types.True;
         's'# -> GHC.Types.True;
         't'# -> GHC.Types.True
       } }

 $wtest0_sdEs
   = \ (ww_sdEo :: GHC.Prim.ByteArray#)
       (ww_sdEp :: GHC.Prim.Int#)
       (ww_sdEq :: GHC.Prim.Int#) ->
       case $sunion_scrz lvl_scJr lvl_sdWa of dt_X4yi [Dmd=<S,U>]
       { __DEFAULT ->
       case $sunion_scrz lvl_scJp dt_X4yi of dt_X4yq { __DEFAULT ->
       $wrunTokenParser_sdDw
         (Main.Many @ Char (Main.Tokens @ Char dt_X4yq lvl_sdWb))
         ww_sdEo
         ww_sdEp
         ww_sdEq
       }
       }
 }}}
 So `SpecConstr` will specialise `$wrunTokenParser` thus:
 {{{
 RULES     "SC:$wrunTokenParser1" [2]
             forall (sc_se8A :: GHC.Prim.Int#)
                    (sc_se8z :: GHC.Prim.Int#)
                    (sc_se8y :: GHC.Prim.ByteArray#)
                    (sc_se8x :: Set Char).
               $wrunTokenParser_sdDw (Main.Many
                                        @ Char (Main.Tokens @ Char sc_se8x
 lvl_sdWb))
                                     sc_se8y
                                     sc_se8z
                                     sc_se8A
               = $s$wrunTokenParser_se8G sc_se8A sc_se8z sc_se8y sc_se8x]
 }}}
 Notice, in particular, `lvl_sdWb`, which is a top-level constant (not
 forall'd by the RULE): the specialised `runTokenParser` knows exactly what
 that function is, and that makes the inner loop fast.  The fact that it is
 specialised for `Many` and `Tokens` is incidental, because
 `runTokenParser` is not recursive; it's the loop inside (which comes from
 `span`) that is rendered fast by knowing the function given to `span`.

 In contrast, `test1` doesn't get any specialisation.
 {{{
 test1 = runTokenParser testGrammar1
 }}}
 Even if you manually eta-expand it, by writing
 {{{
 test3 src = runTokenParser testGrammar1 src
 }}}
 we still get nothing useful
 {{{
 Main.$wtest3
   = \ (ww_sdIT :: GHC.Prim.ByteArray#)
       (ww1_sdIU :: GHC.Prim.Int#)
       (ww2_sdIV :: GHC.Prim.Int#) ->
       $wrunTokenParser_resz testGrammar1_r23E ww_sdIT ww1_sdIU ww2_sdIV
 }}}
 What happened in `test0` (the fast case) is that the programmer manually
 inlined `testGrammar1_r23E`:
 {{{
 testGrammar1_r23E
   = case Main.$sunion lvl17_resE lvl19_resH of dt_X4B6 { __DEFAULT ->
     case Main.$sunion lvl18_resG dt_X4B6 of dt1_X4Be { __DEFAULT ->
     Main.Many @ Char (Main.Tokens @ Char dt1_X4Be lvl10_ress)
     }
     }
 }}}
 But GHC is super-cautious about doing so in `test3`, in case we duplicate
 the work of computing `testGrammar1`:
 Those `union`s might be expensive!  And `test3` might be applied to many
 different `src` arguments.  In contrast, in `test1` you manually put the
 grammar inside the `\src`.

 ---------------
 '''Analysis'''

 There are two problems:

 1.  In `test3`, GHC's caution about inlining `testGrammar1` is (in
 general) reasonable.  Perhaps the Right Thing is to ignore the problem of
 work-duplication if the user supplies an INLINE pragma, which you do in
 this case, presumably for that exact reason.

     Let's see: danilo2, what led you to the INLINE pragma on
 `testGrammar1`?

 2. `test1` is not eta-expanded, because it's a partial application: see
 `Note [Do not eta-expand PAPs]` in `SimplUtils`.  And because it is not
 eta-expanded, `runTokenParser` doesn't get enough arguments and
 `SpecConstr` doesn't consider under-saturated calls.

 Moreover, even `test0` is fragile: it's entirely possible that the full-
 laziness pass will float out all those let-bindings (for `s1`, `s2` etc)
 to top level, since they are independent of `src`


 -------------
 '''Workarounds'''

 A robust improvement is to use `oneShot`:
 {{{
 test3 = oneShot (runTokenParser testGrammar1)
 }}}
 The magic `oneShot` function eta-expands its argument to a one-shot
 lambda:
 {{{
 test3 = \src[one-shot] -> runTokenParser testGrammar1 src
 }}}
 Now at least, if `testGrammar1` is inlined, it won't get floated out
 again.

 Next thing: make this the ''only'' occurrence of `testGrammar1`; then
 there is no duplication issue when inlining it, so we get
 {{{
 test3 = \src[one-shot] -> runTokenParser (...code for the grammar...) src
 }}}
 In the test case, `testGrammar1` is used in two different tests, so it
 currently won't get inlined (lest we have work duplication); but in your
 real code it is probably used only once, I guess.

 Last thing: remove the INLINE pragma on `testGrammar1`.  It may seem silly
 but if a binding has an INLINE pragma, even if it is used exactly once it
 is not inlined. Why not? See `Note [Stable unfoldings and
 preInlineUnconditionally]`.  This really is silly: see Fix 1 below.

 --------------
 '''Fixing it properly'''

 * '''Fix 1''': concerning `Note [Stable unfoldings and
 preInlineUnconditionally]`, perhaps we should not do this for arity-0
 bindings.

 * '''Fix 2''': perhaps we should honour INLINE pragmas on 0-arity
 bindings, and simply inline them at every usage site regardless.

 It's not so clear how to solve the eta-expansion problem. Ideas
 * we could try to make `SpecConstr` deal with under-saturated calls;
 and/or
 * we could eta-expand PAPs provided no coercions were involved.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15519#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list