[GHC] #14737: Improve performance of Simplify.simplCast

GHC ghc-devs at haskell.org
Tue Apr 3 10:59:57 UTC 2018


#14737: Improve performance of Simplify.simplCast
-------------------------------------+-------------------------------------
        Reporter:  tdammers          |                Owner:  (none)
            Type:  bug               |               Status:  patch
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #11735 #14683     |  Differential Rev(s):  Phab:D4385
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by tdammers):

 Replying to [comment:10 simonpj]:
 > Try getting rid of the first equation for `puchCoTyArg`
 > {{{
 > pushCoTyArg co ty
 >   | tyL `eqType` tyR
 >   = Just (ty, mkRepReflCo (piResultTy tyR ty))
 > }}}
 > This is another big pile of type-equalities, rather like calling
 `isReflexiveCo` at the wrong moment.
 >
 > Claim: if it happens that `tyL` = `tyR`, but we go ahead with all that
 `mkCoherenceLeftCo` stuff anyway, then the coercion optimiser will get rid
 of it later.  '''Richard''': will it?
 >
 > But try that change anyway.  NO WAY should `pushCoTyArg` take 54% of
 compile time!

 Plain out removing that case branch gets us down by another 4 seconds:

 {{{
         Tue Apr  3 11:09 2018 Time and Allocation Profiling Report
 (Final)

            ghc-stage2 +RTS -p -RTS -B/home/tobias/well-
 typed/devel/ghc/T14737/inplace/lib ./cases/Grammar.hs -o ./a -fforce-
 recomp

         total time  =        7.86 secs   (7864 ticks @ 1000 us, 1
 processor)
         total alloc = 10,150,661,432 bytes  (excludes profiling overheads)

 COST CENTRE     MODULE     SRC
 %time %alloc

 mkInstCo        CoreOpt    compiler/coreSyn/CoreOpt.hs:982:33-84
 31.7   40.6
 tc_rn_src_decls TcRnDriver
 compiler/typecheck/TcRnDriver.hs:(494,4)-(556,7)     20.6   20.4
 CoreTidy        HscMain    compiler/main/HscMain.hs:1253:27-67
 7.2    5.5
 SimplTopBinds   SimplCore  compiler/simplCore/SimplCore.hs:770:39-74
 6.6    4.6
 simplCast       Simplify
 compiler/simplCore/Simplify.hs:(1213,5)-(1215,37)     3.7    3.5
 zonkTopDecls    TcRnDriver
 compiler/typecheck/TcRnDriver.hs:(445,16)-(446,43)    3.5    3.1
 deSugar         HscMain    compiler/main/HscMain.hs:511:7-44
 2.4    1.9
 coercionKind    Coercion   compiler/types/Coercion.hs:1716:3-7
 1.9    4.6
 isReflexiveCo   Simplify   compiler/simplCore/Simplify.hs:1260:40-55
 1.8    1.4
 Parser          HscMain    compiler/main/HscMain.hs:(316,5)-(384,20)
 1.8    2.3
 StgCmm          HscMain    compiler/main/HscMain.hs:(1428,13)-(1429,62)
 1.6    0.7
 }}}

 I've added a few more SCC's to trace more deeply into `simplCast`, which
 is why `simplCast` itself has seemingly dropped to 3.7% - this isn't
 accurate, because `mkInstCo` makes up most of the rest of the `simplCast`
 call.

 So I suggest committing the branch deletion (assuming that it won't break
 anything).

 From here, I'm not 100% sure which is more promising: digging into
 `mkInstCo` to see if we can make it more efficient, or looking at
 `simplCast` to see if we can make it call `mkInstCo` less often.

 Also:

 > Note that ​Phab:D4395 currently removes the piResultTy from that case,
 but it's quite possible that the eqType call is what's taking up the time.

 The full profile from before the deletion (which, unfortunately, I no
 longer have around) clearly shows that `eqType` is what consumes all that
 time, not `piResultTy`.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14737#comment:12>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list