[GHC] #15519: Minor code refactoring leads to drastic performance degradation

Mon Aug 27 19:37:05 UTC 2018

#15519: Minor code refactoring leads to drastic performance degradation
-------------------------------------+-------------------------------------
        Reporter:  danilo2           |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  highest           |            Milestone:  8.8.1
       Component:  Compiler          |              Version:  8.4.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by sgraf):

 Thanks, that's much easier to reproduce.

 So, it seems that the performance gap stems from the fact that the call to
 `runTokenParser` in `test0` is specialised to the specific grammar. That
 is not the case for `test1`, because its definition doesn't get eta-
 expanded for some reason. Note that the correct arity 1 is detected:

 {{{
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
 test1 [InlPrag=NOINLINE] :: Text -> Result
 [LclIdX,
  Arity=1,
  Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
          WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 60}]
 test1 = runTokenParser testGrammar1
 }}}

 If `test1` would be eta-expanded, the call to `runTokenParser` becomes
 saturated and could (in theory) be specialised to `testGrammar1`. Except
 manual eta-expansion (e.g. `test1 t = runTokenParser testGrammar1 t` shows
 that it's not enough for SpecConstr to pick this up.

 Ironically, the problem seems to be related to the `INLINE` pragma on
 `testGrammar1`. If you omit it, the call in `test1` specialises properly.
 Even CSE can unify `testGrammar1` and the floated out grammar binding from
 `test0`, which wasn't possible before because of the different pragmas I
 suppose.

 So, the fix to apply in your situation seems to be to eta-expand `test1`
 and omit the `INLINE` pragma.

 As to /why/ that fixes performance, I'm really at a loss. It's probably
 related to the fact that the unfolding attached to `testGrammar1` isn't
 `CONLIKE`, whereas the RHS at the time when SpecConstr runs is. I can't
 find any relevant code in SpecConstr that looks at unfoldings of /local/
 ids, though. Perhaps I'll find time to look into this some more tomorrow.

 P.S.: `test2` fails to specialise completely in a similar way.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15519#comment:10>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler