[GHC] #7258: Compiling DynFlags is jolly slow

GHC ghc-devs at haskell.org
Thu Oct 26 10:37:22 UTC 2017


#7258: Compiling DynFlags is jolly slow
-------------------------------------+-------------------------------------
        Reporter:  simonpj           |                Owner:  simonpj
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.6.1
      Resolution:                    |             Keywords:  deriving-perf
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by tdammers):

 Replying to [comment:57 simonpj]:
 > > read, read-appl and show "explode" while all the other examples,
 including getline-appl and getline, behave properly.
 >
 > Very good!
 >
 > {{{
 > sink            CmmPipeline
 compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)        77.8   23.6
 > SimplTopBinds   SimplCore   compiler/simplCore/SimplCore.hs:761:39-74
 9.6   30.0
 > FloatOutwards   SimplCore   compiler/simplCore/SimplCore.hs:471:40-66
 3.0   10.4
 > }}}
 >
 > What does `-dshow-passes` show about the program size?  I.e. are we
 getting very large Core?

 For a 100-line read example:
 {{{
 Glasgow Haskell Compiler, Version 8.3.20171016, stage 2 booted by GHC
 version 8.0.1
 Using binary package database: /home/tobias/well-
 typed/devel/ghc/inplace/lib/package.conf.d/package.cache
 package flags []
 loading package database /home/tobias/well-
 typed/devel/ghc/inplace/lib/package.conf.d
 wired-in package ghc-prim mapped to ghc-prim-0.5.2.0
 wired-in package integer-gmp mapped to integer-gmp-1.0.1.0
 wired-in package base mapped to base-4.11.0.0
 wired-in package rts mapped to rts
 wired-in package template-haskell mapped to template-haskell-2.13.0.0
 wired-in package ghc mapped to ghc-8.3
 wired-in package dph-seq not found.
 wired-in package dph-par not found.
 *** Checking old interface for D (use -ddump-hi-diffs for more details):
 *** Parser [D]:
 !!! Parser [D]: finished in 3.22 milliseconds, allocated 3.170 megabytes
 *** Renamer/typechecker [D]:
 !!! Renamer/typechecker [D]: finished in 124.33 milliseconds, allocated
 72.473 megabytes
 *** Desugar [D]:
 Result size of Desugar (after optimization)
   = {terms: 2,667, types: 11,534, coercions: 0, joins: 0/0}
 !!! Desugar [D]: finished in 174.52 milliseconds, allocated 59.456
 megabytes
 *** Simplifier [D]:
 Result size of Simplifier iteration=1
   = {terms: 7,375, types: 18,990, coercions: 6,881, joins: 0/913}
 Result size of Simplifier iteration=2
   = {terms: 3,722, types: 13,604, coercions: 1,351, joins: 0/198}
 Result size of Simplifier iteration=3
   = {terms: 3,319, types: 12,598, coercions: 1,342, joins: 0/1}
 Result size of Simplifier
   = {terms: 3,319, types: 12,598, coercions: 1,342, joins: 0/1}
 !!! Simplifier [D]: finished in 1306.06 milliseconds, allocated 749.648
 megabytes
 *** Specialise [D]:
 Result size of Specialise
   = {terms: 3,319, types: 12,598, coercions: 1,342, joins: 0/1}
 !!! Specialise [D]: finished in 19.96 milliseconds, allocated 17.006
 megabytes
 *** Float out(FOS {Lam = Just 0,
                    Consts = True,
                    OverSatApps = False}) [D]:
 Result size of Float out(FOS {Lam = Just 0,
                               Consts = True,
                               OverSatApps = False})
   = {terms: 4,923, types: 15,504, coercions: 1,342, joins: 0/0}
 !!! Float out(FOS {Lam = Just 0,
                    Consts = True,
                    OverSatApps = False}) [D]: finished in 122.82
 milliseconds, allocated 103.551 megabytes
 *** Simplifier [D]:
 Result size of Simplifier iteration=1
   = {terms: 4,115, types: 13,888, coercions: 1,342, joins: 0/0}
 Result size of Simplifier
   = {terms: 4,115, types: 13,888, coercions: 1,342, joins: 0/0}
 !!! Simplifier [D]: finished in 216.40 milliseconds, allocated 122.592
 megabytes
 *** Simplifier [D]:
 Result size of Simplifier iteration=1
   = {terms: 4,115, types: 13,282, coercions: 1,342, joins: 0/0}
 Result size of Simplifier
   = {terms: 4,115, types: 13,282, coercions: 1,342, joins: 0/0}
 !!! Simplifier [D]: finished in 183.50 milliseconds, allocated 131.494
 megabytes
 *** Simplifier [D]:
 Result size of Simplifier iteration=1
   = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
 Result size of Simplifier
   = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
 !!! Simplifier [D]: finished in 183.89 milliseconds, allocated 126.382
 megabytes
 *** Float inwards [D]:
 Result size of Float inwards
   = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
 !!! Float inwards [D]: finished in 8.67 milliseconds, allocated 4.238
 megabytes
 *** Called arity analysis [D]:
 Result size of Called arity analysis
   = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
 !!! Called arity analysis [D]: finished in 5.66 milliseconds, allocated
 5.974 megabytes
 *** Simplifier [D]:
 Result size of Simplifier
   = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
 !!! Simplifier [D]: finished in 67.09 milliseconds, allocated 62.910
 megabytes
 *** Demand analysis [D]:
 Result size of Demand analysis
   = {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
 !!! Demand analysis [D]: finished in 170.27 milliseconds, allocated 91.193
 megabytes
 *** Worker Wrapper binds [D]:
 Result size of Worker Wrapper binds
   = {terms: 4,541, types: 14,225, coercions: 1,342, joins: 0/3}
 !!! Worker Wrapper binds [D]: finished in 5.09 milliseconds, allocated
 3.614 megabytes
 *** Simplifier [D]:
 Result size of Simplifier iteration=1
   = {terms: 4,530, types: 14,215, coercions: 1,342, joins: 0/0}
 Result size of Simplifier
   = {terms: 4,530, types: 14,215, coercions: 1,342, joins: 0/0}
 !!! Simplifier [D]: finished in 351.93 milliseconds, allocated 219.389
 megabytes
 *** Float out(FOS {Lam = Just 0,
                    Consts = True,
                    OverSatApps = True}) [D]:
 Result size of Float out(FOS {Lam = Just 0,
                               Consts = True,
                               OverSatApps = True})
   = {terms: 4,530, types: 14,215, coercions: 1,342, joins: 0/0}
 !!! Float out(FOS {Lam = Just 0,
                    Consts = True,
                    OverSatApps = True}) [D]: finished in 268.13
 milliseconds, allocated 280.555 megabytes
 *** Common sub-expression [D]:
 Result size of Common sub-expression
   = {terms: 3,740, types: 13,522, coercions: 1,342, joins: 0/0}
 !!! Common sub-expression [D]: finished in 38.50 milliseconds, allocated
 48.593 megabytes
 *** Float inwards [D]:
 Result size of Float inwards
   = {terms: 3,740, types: 13,522, coercions: 1,342, joins: 0/0}
 !!! Float inwards [D]: finished in 5.37 milliseconds, allocated 3.369
 megabytes
 *** Simplifier [D]:
 Result size of Simplifier
   = {terms: 2,952, types: 12,337, coercions: 1,342, joins: 0/0}
 !!! Simplifier [D]: finished in 140.01 milliseconds, allocated 98.855
 megabytes
 *** Demand analysis [D]:
 Result size of Demand analysis
   = {terms: 2,952, types: 12,337, coercions: 1,342, joins: 0/0}
 !!! Demand analysis [D]: finished in 77.63 milliseconds, allocated 86.076
 megabytes
 *** CoreTidy [D]:
 Result size of Tidy Core
   = {terms: 2,952, types: 12,337, coercions: 1,342, joins: 0/0}
 !!! CoreTidy [D]: finished in 54.47 milliseconds, allocated 69.051
 megabytes
 Created temporary directory: /tmp/ghc27776_0
 *** CorePrep [D]:
 Result size of CorePrep
   = {terms: 3,569, types: 13,155, coercions: 1,342, joins: 0/204}
 !!! CorePrep [D]: finished in 25.73 milliseconds, allocated 32.097
 megabytes
 *** Stg2Stg:
 *** CodeGen [D]:
 !!! CodeGen [D]: finished in 2109.26 milliseconds, allocated 1630.642
 megabytes
 *** Assembler:
 *** Deleting temp files:
 Warning: deleting non-existent /tmp/ghc27776_0/ghc_2.c
 *** Deleting temp dirs:
 }}}

 At 200 lines, we're looking at:
 {{{
 Glasgow Haskell Compiler, Version 8.3.20171016, stage 2 booted by GHC
 version 8.0.1
 Using binary package database: /home/tobias/well-
 typed/devel/ghc/inplace/lib/package.conf.d/package.cache
 package flags []
 loading package database /home/tobias/well-
 typed/devel/ghc/inplace/lib/package.conf.d
 wired-in package ghc-prim mapped to ghc-prim-0.5.2.0
 wired-in package integer-gmp mapped to integer-gmp-1.0.1.0
 wired-in package base mapped to base-4.11.0.0
 wired-in package rts mapped to rts
 wired-in package template-haskell mapped to template-haskell-2.13.0.0
 wired-in package ghc mapped to ghc-8.3
 wired-in package dph-seq not found.
 wired-in package dph-par not found.
 *** Checking old interface for D (use -ddump-hi-diffs for more details):
 *** Parser [D]:
 !!! Parser [D]: finished in 3.80 milliseconds, allocated 6.207 megabytes
 *** Renamer/typechecker [D]:
 !!! Renamer/typechecker [D]: finished in 230.12 milliseconds, allocated
 138.058 megabytes
 *** Desugar [D]:
 Result size of Desugar (after optimization)
   = {terms: 5,267, types: 43,034, coercions: 0, joins: 0/0}
 !!! Desugar [D]: finished in 289.66 milliseconds, allocated 201.767
 megabytes
 *** Simplifier [D]:
 Result size of Simplifier iteration=1
   = {terms: 14,575, types: 57,790, coercions: 13,681, joins: 0/1,813}
 Result size of Simplifier iteration=2
   = {terms: 7,322, types: 47,104, coercions: 2,651, joins: 0/398}
 Result size of Simplifier iteration=3
   = {terms: 6,519, types: 45,098, coercions: 2,642, joins: 0/1}
 Result size of Simplifier
   = {terms: 6,519, types: 45,098, coercions: 2,642, joins: 0/1}
 !!! Simplifier [D]: finished in 5652.80 milliseconds, allocated 2700.163
 megabytes
 *** Specialise [D]:
 Result size of Specialise
   = {terms: 6,519, types: 45,098, coercions: 2,642, joins: 0/1}
 !!! Specialise [D]: finished in 62.79 milliseconds, allocated 58.261
 megabytes
 *** Float out(FOS {Lam = Just 0,
                    Consts = True,
                    OverSatApps = False}) [D]:
 Result size of Float out(FOS {Lam = Just 0,
                               Consts = True,
                               OverSatApps = False})
   = {terms: 9,723, types: 50,904, coercions: 2,642, joins: 0/0}
 !!! Float out(FOS {Lam = Just 0,
                    Consts = True,
                    OverSatApps = False}) [D]: finished in 401.80
 milliseconds, allocated 374.441 megabytes
 *** Simplifier [D]:
 Result size of Simplifier iteration=1
   = {terms: 8,115, types: 47,688, coercions: 2,642, joins: 0/0}
 Result size of Simplifier
   = {terms: 8,115, types: 47,688, coercions: 2,642, joins: 0/0}
 !!! Simplifier [D]: finished in 754.47 milliseconds, allocated 393.159
 megabytes
 *** Simplifier [D]:
 Result size of Simplifier iteration=1
   = {terms: 8,115, types: 46,482, coercions: 2,642, joins: 0/0}
 Result size of Simplifier
   = {terms: 8,115, types: 46,482, coercions: 2,642, joins: 0/0}
 !!! Simplifier [D]: finished in 579.29 milliseconds, allocated 411.892
 megabytes
 *** Simplifier [D]:
 Result size of Simplifier iteration=1
   = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
 Result size of Simplifier
   = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
 !!! Simplifier [D]: finished in 630.25 milliseconds, allocated 382.851
 megabytes
 *** Float inwards [D]:
 Result size of Float inwards
   = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
 !!! Float inwards [D]: finished in 8.17 milliseconds, allocated 8.300
 megabytes
 *** Called arity analysis [D]:
 Result size of Called arity analysis
   = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
 !!! Called arity analysis [D]: finished in 21.61 milliseconds, allocated
 11.860 megabytes
 *** Simplifier [D]:
 Result size of Simplifier
   = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
 !!! Simplifier [D]: finished in 362.68 milliseconds, allocated 190.967
 megabytes
 *** Demand analysis [D]:
 Result size of Demand analysis
   = {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
 !!! Demand analysis [D]: finished in 551.36 milliseconds, allocated
 339.529 megabytes
 *** Worker Wrapper binds [D]:
 Result size of Worker Wrapper binds
   = {terms: 8,941, types: 48,325, coercions: 2,642, joins: 0/3}
 !!! Worker Wrapper binds [D]: finished in 7.70 milliseconds, allocated
 11.542 megabytes
 *** Simplifier [D]:
 Result size of Simplifier iteration=1
   = {terms: 8,930, types: 48,315, coercions: 2,642, joins: 0/0}
 Result size of Simplifier
   = {terms: 8,930, types: 48,315, coercions: 2,642, joins: 0/0}
 !!! Simplifier [D]: finished in 1307.95 milliseconds, allocated 727.784
 megabytes
 *** Float out(FOS {Lam = Just 0,
                    Consts = True,
                    OverSatApps = True}) [D]:
 Result size of Float out(FOS {Lam = Just 0,
                               Consts = True,
                               OverSatApps = True})
   = {terms: 8,930, types: 48,315, coercions: 2,642, joins: 0/0}
 !!! Float out(FOS {Lam = Just 0,
                    Consts = True,
                    OverSatApps = True}) [D]: finished in 1393.21
 milliseconds, allocated 1078.391 megabytes
 *** Common sub-expression [D]:
 Result size of Common sub-expression
   = {terms: 7,340, types: 46,922, coercions: 2,642, joins: 0/0}
 !!! Common sub-expression [D]: finished in 227.12 milliseconds, allocated
 194.184 megabytes
 *** Float inwards [D]:
 Result size of Float inwards
   = {terms: 7,340, types: 46,922, coercions: 2,642, joins: 0/0}
 !!! Float inwards [D]: finished in 11.63 milliseconds, allocated 6.545
 megabytes
 *** Simplifier [D]:
 Result size of Simplifier
   = {terms: 5,752, types: 44,537, coercions: 2,642, joins: 0/0}
 !!! Simplifier [D]: finished in 590.60 milliseconds, allocated 349.746
 megabytes
 *** Demand analysis [D]:
 Result size of Demand analysis
   = {terms: 5,752, types: 44,537, coercions: 2,642, joins: 0/0}
 !!! Demand analysis [D]: finished in 485.19 milliseconds, allocated
 330.909 megabytes
 *** CoreTidy [D]:
 Result size of Tidy Core
   = {terms: 5,752, types: 44,537, coercions: 2,642, joins: 0/0}
 !!! CoreTidy [D]: finished in 293.13 milliseconds, allocated 258.898
 megabytes
 Created temporary directory: /tmp/ghc18725_0
 *** CorePrep [D]:
 Result size of CorePrep
   = {terms: 6,969, types: 46,155, coercions: 2,642, joins: 0/404}
 !!! CorePrep [D]: finished in 163.49 milliseconds, allocated 113.842
 megabytes
 *** Stg2Stg:
 *** CodeGen [D]:
 !!! CodeGen [D]: finished in 23041.19 milliseconds, allocated 12665.144
 megabytes
 *** Assembler:
 *** Deleting temp files:
 Warning: deleting non-existent /tmp/ghc18725_0/ghc_2.c
 *** Deleting temp dirs:
 }}}

 Not sure whether this qualifies as "unusually large". `-ddump-cmm` shows
 that the C-- for a 100-field read instance is more than twice as large as
 that for the 100-field getline example (39234 lines vs 14895), but size
 alone doesn't explain the huge difference.

 However, at 300 fields, the read example produces 237541 lines of C--,
 while getline grows roughly linearly, to 47312 lines, meaning that there
 is definitely some sort of nonlinearity going on wrt C-- code size.

 > So `sink` in C-- seems very slow.  Nearly 4 bytes in 5 of all allocation
 is in tis pass alone. Is this a non-linear algorithmic effect.  e.g if you
 double the size of the input, does the `sink` time go up non-linearly
 while `simplTopBinds` goes up linearly?   That's my guess.

 Yes, in fact it does, quite clearly so:

 10-field example:
 {{{
         Wed Oct 25 14:57 2017 Time and Allocation Profiling Report
 (Final)

            ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well-
 typed/devel/ghc/inplace/lib -B/home/tobias/well-
 typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-10-read.hs

         total time  =        0.19 secs   (188 ticks @ 1000 us, 1
 processor)
         total alloc = 134,109,112 bytes  (excludes profiling overheads)

 COST CENTRE        MODULE      SRC
 %time %alloc

 SimplTopBinds      SimplCore   compiler/simplCore/SimplCore.hs:761:39-74
 35.6   28.7
 tc_rn_src_decls    TcRnDriver
 compiler/typecheck/TcRnDriver.hs:(493,4)-(555,7)      9.0    9.0
 OccAnal            SimplCore
 compiler/simplCore/SimplCore.hs:(739,22)-(740,67)     4.8    5.5
 FloatOutwards      SimplCore   compiler/simplCore/SimplCore.hs:471:40-66
 4.3    5.1
 pprNativeCode      AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)    3.7    3.5
 tcRnImports        TcRnDriver  compiler/typecheck/TcRnDriver.hs:240:20-50
 3.7    3.6
 StgCmm             HscMain
 compiler/main/HscMain.hs:(1426,13)-(1427,62)          3.7    2.1
 sink               CmmPipeline
 compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)         3.2    1.5
 setSessionDynFlags GHC         compiler/main/GHC.hs:(578,1)-(584,16)
 3.2    4.7
 NativeCodeGen      CodeOutput  compiler/main/CodeOutput.hs:171:18-78
 2.1    0.9
 NewStranal         SimplCore   compiler/simplCore/SimplCore.hs:480:40-63
 2.1    2.9
 bin_tycldecls      HscTypes    compiler/main/HscTypes.hs:1085:52-57
 1.6    1.0
 initGhcMonad       GHC         compiler/main/GHC.hs:(493,1)-(503,25)
 1.6    4.7
 MAIN               MAIN        <built-in>
 1.1    0.3
 fixStgRegisters    AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:566:17-42
 1.1    0.2
 genMachCode        AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62)    1.1    1.3
 shortcutBranches   AsmCodeGen  compiler/nativeGen/AsmCodeGen.hs:694:17-54
 1.1    0.2
 elimCommonBlocks   CmmPipeline compiler/cmm/CmmPipeline.hs:(71,13)-(72,76)
 1.1    1.0
 CAF                PrimOp      <entire-module>
 1.1    0.5
 CAF                PrelInfo    <entire-module>
 1.1    1.0
 deSugar            HscMain     compiler/main/HscMain.hs:511:7-44
 1.1    0.9
 CorePrep           HscMain
 compiler/main/HscMain.hs:(1313,24)-(1314,57)          1.1    0.6
 withCleanupSession GHC         compiler/main/GHC.hs:(466,1)-(475,37)
 1.1    0.8
 regLiveness        AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52)    0.5    2.0
 RegAlloc-linear    AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)    0.5    2.4
 layoutStack        CmmPipeline compiler/cmm/CmmPipeline.hs:(97,13)-(99,40)
 0.5    1.0
 CoreTidy           HscMain     compiler/main/HscMain.hs:1253:27-67
 0.5    1.0
 }}}

 100 fields:
 {{{
         Wed Oct 25 15:00 2017 Time and Allocation Profiling Report
 (Final)

            ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well-
 typed/devel/ghc/inplace/lib -B/home/tobias/well-
 typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-100-read.hs

         total time  =        3.10 secs   (3103 ticks @ 1000 us, 1
 processor)
         total alloc = 2,397,379,664 bytes  (excludes profiling overheads)

 COST CENTRE     MODULE      SRC
 %time %alloc

 SimplTopBinds   SimplCore   compiler/simplCore/SimplCore.hs:761:39-74
 38.9   36.6
 sink            CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)
 16.0    7.3
 FloatOutwards   SimplCore   compiler/simplCore/SimplCore.hs:471:40-66
 7.9   10.8
 OccAnal         SimplCore
 compiler/simplCore/SimplCore.hs:(739,22)-(740,67)     4.3    5.8
 pprNativeCode   AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)    4.1    5.3
 RegAlloc-linear AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)    3.8    4.6
 StgCmm          HscMain     compiler/main/HscMain.hs:(1426,13)-(1427,62)
 2.8    2.5
 NewStranal      SimplCore   compiler/simplCore/SimplCore.hs:480:40-63
 2.8    4.8
 regLiveness     AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52)    1.8    1.9
 CoreTidy        HscMain     compiler/main/HscMain.hs:1253:27-67
 1.8    2.9
 genMachCode     AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62)    1.8    1.6
 tc_rn_src_decls TcRnDriver
 compiler/typecheck/TcRnDriver.hs:(493,4)-(555,7)      1.3    1.3
 NativeCodeGen   CodeOutput  compiler/main/CodeOutput.hs:171:18-78
 1.0    1.7
 CommonSubExpr   SimplCore   compiler/simplCore/SimplCore.hs:462:40-56
 0.8    1.2
 deSugar         HscMain     compiler/main/HscMain.hs:511:7-44
 0.8    1.3
 }}}

 400 fields:
 {{{
         Thu Oct 26 01:25 2017 Time and Allocation Profiling Report
 (Final)

            ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well-
 typed/devel/ghc/inplace/lib -B/home/tobias/well-
 typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-400-read.hs

         total time  =      128.62 secs   (128624 ticks @ 1000 us, 1
 processor)
         total alloc = 37,659,351,320 bytes  (excludes profiling overheads)

 COST CENTRE     MODULE      SRC
 %time %alloc

 sink            CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)
 77.8   23.6
 SimplTopBinds   SimplCore   compiler/simplCore/SimplCore.hs:761:39-74
 9.6   30.0
 FloatOutwards   SimplCore   compiler/simplCore/SimplCore.hs:471:40-66
 3.0   10.4
 RegAlloc-linear AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55)    1.6    5.0
 OccAnal         SimplCore
 compiler/simplCore/SimplCore.hs:(739,22)-(740,67)     1.1    4.2
 pprNativeCode   AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65)    1.0    4.7
 NewStranal      SimplCore   compiler/simplCore/SimplCore.hs:480:40-63
 0.9    4.2
 CoreTidy        HscMain     compiler/main/HscMain.hs:1253:27-67
 0.7    2.9
 StgCmm          HscMain     compiler/main/HscMain.hs:(1426,13)-(1427,62)
 0.6    1.9
 regLiveness     AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52)    0.5    1.4
 genMachCode     AsmCodeGen
 compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62)    0.4    1.3
 NativeCodeGen   CodeOutput  compiler/main/CodeOutput.hs:171:18-78
 0.4    1.6
 CommonSubExpr   SimplCore   compiler/simplCore/SimplCore.hs:462:40-56
 0.2    1.1
 deSugar         HscMain     compiler/main/HscMain.hs:511:7-44
 0.2    1.1
 }}}

 > (Related question: does `sink` show up a a big item when compiling
 "normal" programs?)

 It does, but nowhere near this dominantly. E.g. getline, 400 fields:
 {{{
         Thu Oct 26 02:54 2017 Time and Allocation Profiling Report
 (Final)

            ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well-
 typed/devel/ghc/inplace/lib -B/home/tobias/well-
 typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-400-getline.hs

         total time  =       14.13 secs   (14126 ticks @ 1000 us, 1
 processor)
         total alloc = 12,479,030,504 bytes  (excludes profiling overheads)

 COST CENTRE   MODULE      SRC
 %time %alloc

 sink          CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)
 31.7    3.3
 SimplTopBinds SimplCore   compiler/simplCore/SimplCore.hs:761:39-74
 24.7   39.0
 OccAnal       SimplCore
 compiler/simplCore/SimplCore.hs:(739,22)-(740,67)    7.2    9.7
 NewStranal    SimplCore   compiler/simplCore/SimplCore.hs:480:40-63
 6.5   10.4
 CoreTidy      HscMain     compiler/main/HscMain.hs:1253:27-67
 5.2    7.6
 FloatOutwards SimplCore   compiler/simplCore/SimplCore.hs:471:40-66
 4.1    6.2
 StgCmm        HscMain     compiler/main/HscMain.hs:(1426,13)-(1427,62)
 2.0    1.6
 deSugar       HscMain     compiler/main/HscMain.hs:511:7-44
 2.0    3.3
 CommonSubExpr SimplCore   compiler/simplCore/SimplCore.hs:462:40-56
 1.9    3.2
 CorePrep      HscMain     compiler/main/HscMain.hs:(1313,24)-(1314,57)
 1.4    2.5
 Specialise    SimplCore   compiler/simplCore/SimplCore.hs:486:40-50
 0.8    1.3
 StgCse        SimplStg    compiler/simplStg/SimplStg.hs:(88,14)-(91,44)
 0.5    1.3
 }}}

 > Guessing: the "shape" of the C-- code makes `sink` behave very badly.
 Can you characterise what that shape is, and where the algorithmic badness
 comes from?

 Not yet, but I have a hunch that it's simply a matter of producing too
 much of it. Maybe there is quadratic behavior there, where the number of
 lines of C-- grows proportionally with the square of the number of fields?
 That seems like a plausible explanation so far.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/7258#comment:58>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list