[GHC] #12150: Compile time performance degradation on code that uses undefined/error with CallStacks

GHC ghc-devs at haskell.org
Sun Jul 31 18:28:48 UTC 2016


#12150: Compile time performance degradation on code that uses undefined/error with
CallStacks
-------------------------------------+-------------------------------------
        Reporter:  thomie            |                Owner:
            Type:  bug               |               Status:  new
        Priority:  high              |            Milestone:  8.0.2
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #10844            |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by gridaphobe):

 So, although my no-inline patch is now producing better results for the
 original CallStack-inlining issue, it doesn't seem to be doing much for
 this issue. Upon closer inspection of the Core I noticed something quite
 interesting. When we remove the CallStacks from `undefined`, GHC manages
 to optimize away the entire set of guards! In stark contrast, leaving the
 CallStacks gives us the following Core

 {{{
 -- RHS size: {terms: 4, types: 7, coercions: 0}
 Test.$fFunctorResult_$cfmap [InlPrag=INLINE (sat-args=0)]
   :: forall a_aFQ b_aFR.
      (a_aFQ -> b_aFR) -> Result a_aFQ -> Result b_aFR
 [GblId,
  Str=b,
  Unf=Unf{Src=InlineStable, TopLvl=True, Value=False, ConLike=False,
          WorkFree=False, Expandable=False,
          Guidance=ALWAYS_IF(arity=0,unsat_ok=False,boring_ok=False)
          Tmpl= \ (@ a_aFS) (@ b_aFT) ->
                  let {
                    $dIP_s1QH :: GHC.Stack.Types.CallStack
                    [LclId]
                    $dIP_s1QH =
                      GHC.Stack.Types.pushCallStack
                        (Test.$fFunctorResult10, Test.$fFunctorResult8)
                        GHC.Stack.Types.emptyCallStack } in
                  let {
                    bool1_aqS :: forall a1_a1Dw. a1_a1Dw
                    [LclId]
                    bool1_aqS =
                      \ (@ a1_a1Dw) ->
                        undefined
                          @ 'GHC.Types.PtrRepLifted
                          @ a1_a1Dw
                          ($dIP_s1QH
                           `cast` (Sym
                                     (GHC.Classes.N:IP[0]
                                        <"callStack">_N
 <GHC.Stack.Types.CallStack>_N)
                                   :: (GHC.Stack.Types.CallStack :: *)
                                      ~R#
 ((?callStack::GHC.Stack.Types.CallStack) :: Constraint))) } in
                  case bool1_aqS @ Bool of {
                    False ->
                      case bool1_aqS @ Bool of {
                        False ->
                          case bool1_aqS @ Bool of {
                            False ->
                              case bool1_aqS @ Bool of {
                                False ->
                                  case bool1_aqS @ Bool of {
                                    False ->
                                      case bool1_aqS @ Bool of {
                                        False ->
                                          case bool1_aqS @ Bool of {
                                            False ->
                                              case bool1_aqS @ Bool of {
 ...]
 Test.$fFunctorResult_$cfmap =
   \ (@ a_aFS) (@ b_aFT) -> case bool1_aqS of wild_00 { }
 }}}

 So GHC is optimizing the actual definition, but not the unfolding, whereas
 with the old `undefined` it also optimized the unfolding.

 I also noticed that I can reproduce both behaviors with a simple wrapper
 around `undefined`, by choosing whether the wrapper should take a
 CallStack.

 {{{
 -- myUndef :: HasCallStack => a
 -- myUndef :: a
 myUndef = undefined
 }}}

 The HasCallStack variant is slow, but the CallStack-free variant behaves
 just like the old undefined. I suppose this is to be expected, but it
 seems to suggest that the issue is connected to the fact that a use of
 `undefined` is now really an application `undefined $call_stack` rather
 than a simple variable. Indeed, if I make one final tweak to the program
 and use the **old `error`** instead of `undefined`, I get a similar
 behavior to the **new** `undefined`.

 {{{
 N clauses : time (s)
 10        : 0.44
 20        : 0.48
 40        : 0.54
 80        : 0.76
 160       : 1.13
 320       : 1.93
 640       : 3.52
 1280      : 6.97
 }}}

 I'm not really sure where to go from here. It seems like the solution
 would be to convince GHC to rewrite the unfolding for the new undefined
 (or error for that matter), but rewriting the unfolding is not something
 we're supposed to do, right? Which makes me wonder, why was GHC rewriting
 the unfolding for the old undefined in the first place?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12150#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list