[GHC] #7258: Compiling DynFlags is jolly slow
GHC
ghc-devs at haskell.org
Thu Oct 26 10:37:22 UTC 2017
#7258: Compiling DynFlags is jolly slow
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner: simonpj
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.6.1
Resolution: | Keywords: deriving-perf
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by tdammers):
Replying to [comment:57 simonpj]:
> > read, read-appl and show "explode" while all the other examples,
including getline-appl and getline, behave properly.
>
> Very good!
>
> {{{
> sink CmmPipeline
compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 77.8 23.6
> SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74
9.6 30.0
> FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66
3.0 10.4
> }}}
>
> What does `-dshow-passes` show about the program size? I.e. are we
getting very large Core?
For a 100-line read example:
{{{
Glasgow Haskell Compiler, Version 8.3.20171016, stage 2 booted by GHC
version 8.0.1
Using binary package database: /home/tobias/well-
typed/devel/ghc/inplace/lib/package.conf.d/package.cache
package flags []
loading package database /home/tobias/well-
typed/devel/ghc/inplace/lib/package.conf.d
wired-in package ghc-prim mapped to ghc-prim-0.5.2.0
wired-in package integer-gmp mapped to integer-gmp-1.0.1.0
wired-in package base mapped to base-4.11.0.0
wired-in package rts mapped to rts
wired-in package template-haskell mapped to template-haskell-2.13.0.0
wired-in package ghc mapped to ghc-8.3
wired-in package dph-seq not found.
wired-in package dph-par not found.
*** Checking old interface for D (use -ddump-hi-diffs for more details):
*** Parser [D]:
!!! Parser [D]: finished in 3.22 milliseconds, allocated 3.170 megabytes
*** Renamer/typechecker [D]:
!!! Renamer/typechecker [D]: finished in 124.33 milliseconds, allocated
72.473 megabytes
*** Desugar [D]:
Result size of Desugar (after optimization)
= {terms: 2,667, types: 11,534, coercions: 0, joins: 0/0}
!!! Desugar [D]: finished in 174.52 milliseconds, allocated 59.456
megabytes
*** Simplifier [D]:
Result size of Simplifier iteration=1
= {terms: 7,375, types: 18,990, coercions: 6,881, joins: 0/913}
Result size of Simplifier iteration=2
= {terms: 3,722, types: 13,604, coercions: 1,351, joins: 0/198}
Result size of Simplifier iteration=3
= {terms: 3,319, types: 12,598, coercions: 1,342, joins: 0/1}
Result size of Simplifier
= {terms: 3,319, types: 12,598, coercions: 1,342, joins: 0/1}
!!! Simplifier [D]: finished in 1306.06 milliseconds, allocated 749.648
megabytes
*** Specialise [D]:
Result size of Specialise
= {terms: 3,319, types: 12,598, coercions: 1,342, joins: 0/1}
!!! Specialise [D]: finished in 19.96 milliseconds, allocated 17.006
megabytes
*** Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = False}) [D]:
Result size of Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = False})
= {terms: 4,923, types: 15,504, coercions: 1,342, joins: 0/0}
!!! Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = False}) [D]: finished in 122.82
milliseconds, allocated 103.551 megabytes
*** Simplifier [D]:
Result size of Simplifier iteration=1
= {terms: 4,115, types: 13,888, coercions: 1,342, joins: 0/0}
Result size of Simplifier
= {terms: 4,115, types: 13,888, coercions: 1,342, joins: 0/0}
!!! Simplifier [D]: finished in 216.40 milliseconds, allocated 122.592
megabytes
*** Simplifier [D]:
Result size of Simplifier iteration=1
= {terms: 4,115, types: 13,282, coercions: 1,342, joins: 0/0}
Result size of Simplifier
= {terms: 4,115, types: 13,282, coercions: 1,342, joins: 0/0}
!!! Simplifier [D]: finished in 183.50 milliseconds, allocated 131.494
megabytes
*** Simplifier [D]:
Result size of Simplifier iteration=1
= {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
Result size of Simplifier
= {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
!!! Simplifier [D]: finished in 183.89 milliseconds, allocated 126.382
megabytes
*** Float inwards [D]:
Result size of Float inwards
= {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
!!! Float inwards [D]: finished in 8.67 milliseconds, allocated 4.238
megabytes
*** Called arity analysis [D]:
Result size of Called arity analysis
= {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
!!! Called arity analysis [D]: finished in 5.66 milliseconds, allocated
5.974 megabytes
*** Simplifier [D]:
Result size of Simplifier
= {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
!!! Simplifier [D]: finished in 67.09 milliseconds, allocated 62.910
megabytes
*** Demand analysis [D]:
Result size of Demand analysis
= {terms: 4,523, types: 14,200, coercions: 1,342, joins: 0/0}
!!! Demand analysis [D]: finished in 170.27 milliseconds, allocated 91.193
megabytes
*** Worker Wrapper binds [D]:
Result size of Worker Wrapper binds
= {terms: 4,541, types: 14,225, coercions: 1,342, joins: 0/3}
!!! Worker Wrapper binds [D]: finished in 5.09 milliseconds, allocated
3.614 megabytes
*** Simplifier [D]:
Result size of Simplifier iteration=1
= {terms: 4,530, types: 14,215, coercions: 1,342, joins: 0/0}
Result size of Simplifier
= {terms: 4,530, types: 14,215, coercions: 1,342, joins: 0/0}
!!! Simplifier [D]: finished in 351.93 milliseconds, allocated 219.389
megabytes
*** Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = True}) [D]:
Result size of Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = True})
= {terms: 4,530, types: 14,215, coercions: 1,342, joins: 0/0}
!!! Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = True}) [D]: finished in 268.13
milliseconds, allocated 280.555 megabytes
*** Common sub-expression [D]:
Result size of Common sub-expression
= {terms: 3,740, types: 13,522, coercions: 1,342, joins: 0/0}
!!! Common sub-expression [D]: finished in 38.50 milliseconds, allocated
48.593 megabytes
*** Float inwards [D]:
Result size of Float inwards
= {terms: 3,740, types: 13,522, coercions: 1,342, joins: 0/0}
!!! Float inwards [D]: finished in 5.37 milliseconds, allocated 3.369
megabytes
*** Simplifier [D]:
Result size of Simplifier
= {terms: 2,952, types: 12,337, coercions: 1,342, joins: 0/0}
!!! Simplifier [D]: finished in 140.01 milliseconds, allocated 98.855
megabytes
*** Demand analysis [D]:
Result size of Demand analysis
= {terms: 2,952, types: 12,337, coercions: 1,342, joins: 0/0}
!!! Demand analysis [D]: finished in 77.63 milliseconds, allocated 86.076
megabytes
*** CoreTidy [D]:
Result size of Tidy Core
= {terms: 2,952, types: 12,337, coercions: 1,342, joins: 0/0}
!!! CoreTidy [D]: finished in 54.47 milliseconds, allocated 69.051
megabytes
Created temporary directory: /tmp/ghc27776_0
*** CorePrep [D]:
Result size of CorePrep
= {terms: 3,569, types: 13,155, coercions: 1,342, joins: 0/204}
!!! CorePrep [D]: finished in 25.73 milliseconds, allocated 32.097
megabytes
*** Stg2Stg:
*** CodeGen [D]:
!!! CodeGen [D]: finished in 2109.26 milliseconds, allocated 1630.642
megabytes
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc27776_0/ghc_2.c
*** Deleting temp dirs:
}}}
At 200 lines, we're looking at:
{{{
Glasgow Haskell Compiler, Version 8.3.20171016, stage 2 booted by GHC
version 8.0.1
Using binary package database: /home/tobias/well-
typed/devel/ghc/inplace/lib/package.conf.d/package.cache
package flags []
loading package database /home/tobias/well-
typed/devel/ghc/inplace/lib/package.conf.d
wired-in package ghc-prim mapped to ghc-prim-0.5.2.0
wired-in package integer-gmp mapped to integer-gmp-1.0.1.0
wired-in package base mapped to base-4.11.0.0
wired-in package rts mapped to rts
wired-in package template-haskell mapped to template-haskell-2.13.0.0
wired-in package ghc mapped to ghc-8.3
wired-in package dph-seq not found.
wired-in package dph-par not found.
*** Checking old interface for D (use -ddump-hi-diffs for more details):
*** Parser [D]:
!!! Parser [D]: finished in 3.80 milliseconds, allocated 6.207 megabytes
*** Renamer/typechecker [D]:
!!! Renamer/typechecker [D]: finished in 230.12 milliseconds, allocated
138.058 megabytes
*** Desugar [D]:
Result size of Desugar (after optimization)
= {terms: 5,267, types: 43,034, coercions: 0, joins: 0/0}
!!! Desugar [D]: finished in 289.66 milliseconds, allocated 201.767
megabytes
*** Simplifier [D]:
Result size of Simplifier iteration=1
= {terms: 14,575, types: 57,790, coercions: 13,681, joins: 0/1,813}
Result size of Simplifier iteration=2
= {terms: 7,322, types: 47,104, coercions: 2,651, joins: 0/398}
Result size of Simplifier iteration=3
= {terms: 6,519, types: 45,098, coercions: 2,642, joins: 0/1}
Result size of Simplifier
= {terms: 6,519, types: 45,098, coercions: 2,642, joins: 0/1}
!!! Simplifier [D]: finished in 5652.80 milliseconds, allocated 2700.163
megabytes
*** Specialise [D]:
Result size of Specialise
= {terms: 6,519, types: 45,098, coercions: 2,642, joins: 0/1}
!!! Specialise [D]: finished in 62.79 milliseconds, allocated 58.261
megabytes
*** Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = False}) [D]:
Result size of Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = False})
= {terms: 9,723, types: 50,904, coercions: 2,642, joins: 0/0}
!!! Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = False}) [D]: finished in 401.80
milliseconds, allocated 374.441 megabytes
*** Simplifier [D]:
Result size of Simplifier iteration=1
= {terms: 8,115, types: 47,688, coercions: 2,642, joins: 0/0}
Result size of Simplifier
= {terms: 8,115, types: 47,688, coercions: 2,642, joins: 0/0}
!!! Simplifier [D]: finished in 754.47 milliseconds, allocated 393.159
megabytes
*** Simplifier [D]:
Result size of Simplifier iteration=1
= {terms: 8,115, types: 46,482, coercions: 2,642, joins: 0/0}
Result size of Simplifier
= {terms: 8,115, types: 46,482, coercions: 2,642, joins: 0/0}
!!! Simplifier [D]: finished in 579.29 milliseconds, allocated 411.892
megabytes
*** Simplifier [D]:
Result size of Simplifier iteration=1
= {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
Result size of Simplifier
= {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
!!! Simplifier [D]: finished in 630.25 milliseconds, allocated 382.851
megabytes
*** Float inwards [D]:
Result size of Float inwards
= {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
!!! Float inwards [D]: finished in 8.17 milliseconds, allocated 8.300
megabytes
*** Called arity analysis [D]:
Result size of Called arity analysis
= {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
!!! Called arity analysis [D]: finished in 21.61 milliseconds, allocated
11.860 megabytes
*** Simplifier [D]:
Result size of Simplifier
= {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
!!! Simplifier [D]: finished in 362.68 milliseconds, allocated 190.967
megabytes
*** Demand analysis [D]:
Result size of Demand analysis
= {terms: 8,923, types: 48,300, coercions: 2,642, joins: 0/0}
!!! Demand analysis [D]: finished in 551.36 milliseconds, allocated
339.529 megabytes
*** Worker Wrapper binds [D]:
Result size of Worker Wrapper binds
= {terms: 8,941, types: 48,325, coercions: 2,642, joins: 0/3}
!!! Worker Wrapper binds [D]: finished in 7.70 milliseconds, allocated
11.542 megabytes
*** Simplifier [D]:
Result size of Simplifier iteration=1
= {terms: 8,930, types: 48,315, coercions: 2,642, joins: 0/0}
Result size of Simplifier
= {terms: 8,930, types: 48,315, coercions: 2,642, joins: 0/0}
!!! Simplifier [D]: finished in 1307.95 milliseconds, allocated 727.784
megabytes
*** Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = True}) [D]:
Result size of Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = True})
= {terms: 8,930, types: 48,315, coercions: 2,642, joins: 0/0}
!!! Float out(FOS {Lam = Just 0,
Consts = True,
OverSatApps = True}) [D]: finished in 1393.21
milliseconds, allocated 1078.391 megabytes
*** Common sub-expression [D]:
Result size of Common sub-expression
= {terms: 7,340, types: 46,922, coercions: 2,642, joins: 0/0}
!!! Common sub-expression [D]: finished in 227.12 milliseconds, allocated
194.184 megabytes
*** Float inwards [D]:
Result size of Float inwards
= {terms: 7,340, types: 46,922, coercions: 2,642, joins: 0/0}
!!! Float inwards [D]: finished in 11.63 milliseconds, allocated 6.545
megabytes
*** Simplifier [D]:
Result size of Simplifier
= {terms: 5,752, types: 44,537, coercions: 2,642, joins: 0/0}
!!! Simplifier [D]: finished in 590.60 milliseconds, allocated 349.746
megabytes
*** Demand analysis [D]:
Result size of Demand analysis
= {terms: 5,752, types: 44,537, coercions: 2,642, joins: 0/0}
!!! Demand analysis [D]: finished in 485.19 milliseconds, allocated
330.909 megabytes
*** CoreTidy [D]:
Result size of Tidy Core
= {terms: 5,752, types: 44,537, coercions: 2,642, joins: 0/0}
!!! CoreTidy [D]: finished in 293.13 milliseconds, allocated 258.898
megabytes
Created temporary directory: /tmp/ghc18725_0
*** CorePrep [D]:
Result size of CorePrep
= {terms: 6,969, types: 46,155, coercions: 2,642, joins: 0/404}
!!! CorePrep [D]: finished in 163.49 milliseconds, allocated 113.842
megabytes
*** Stg2Stg:
*** CodeGen [D]:
!!! CodeGen [D]: finished in 23041.19 milliseconds, allocated 12665.144
megabytes
*** Assembler:
*** Deleting temp files:
Warning: deleting non-existent /tmp/ghc18725_0/ghc_2.c
*** Deleting temp dirs:
}}}
Not sure whether this qualifies as "unusually large". `-ddump-cmm` shows
that the C-- for a 100-field read instance is more than twice as large as
that for the 100-field getline example (39234 lines vs 14895), but size
alone doesn't explain the huge difference.
However, at 300 fields, the read example produces 237541 lines of C--,
while getline grows roughly linearly, to 47312 lines, meaning that there
is definitely some sort of nonlinearity going on wrt C-- code size.
> So `sink` in C-- seems very slow. Nearly 4 bytes in 5 of all allocation
is in tis pass alone. Is this a non-linear algorithmic effect. e.g if you
double the size of the input, does the `sink` time go up non-linearly
while `simplTopBinds` goes up linearly? That's my guess.
Yes, in fact it does, quite clearly so:
10-field example:
{{{
Wed Oct 25 14:57 2017 Time and Allocation Profiling Report
(Final)
ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well-
typed/devel/ghc/inplace/lib -B/home/tobias/well-
typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-10-read.hs
total time = 0.19 secs (188 ticks @ 1000 us, 1
processor)
total alloc = 134,109,112 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC
%time %alloc
SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74
35.6 28.7
tc_rn_src_decls TcRnDriver
compiler/typecheck/TcRnDriver.hs:(493,4)-(555,7) 9.0 9.0
OccAnal SimplCore
compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 4.8 5.5
FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66
4.3 5.1
pprNativeCode AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 3.7 3.5
tcRnImports TcRnDriver compiler/typecheck/TcRnDriver.hs:240:20-50
3.7 3.6
StgCmm HscMain
compiler/main/HscMain.hs:(1426,13)-(1427,62) 3.7 2.1
sink CmmPipeline
compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 3.2 1.5
setSessionDynFlags GHC compiler/main/GHC.hs:(578,1)-(584,16)
3.2 4.7
NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78
2.1 0.9
NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63
2.1 2.9
bin_tycldecls HscTypes compiler/main/HscTypes.hs:1085:52-57
1.6 1.0
initGhcMonad GHC compiler/main/GHC.hs:(493,1)-(503,25)
1.6 4.7
MAIN MAIN <built-in>
1.1 0.3
fixStgRegisters AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:566:17-42
1.1 0.2
genMachCode AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 1.1 1.3
shortcutBranches AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:694:17-54
1.1 0.2
elimCommonBlocks CmmPipeline compiler/cmm/CmmPipeline.hs:(71,13)-(72,76)
1.1 1.0
CAF PrimOp <entire-module>
1.1 0.5
CAF PrelInfo <entire-module>
1.1 1.0
deSugar HscMain compiler/main/HscMain.hs:511:7-44
1.1 0.9
CorePrep HscMain
compiler/main/HscMain.hs:(1313,24)-(1314,57) 1.1 0.6
withCleanupSession GHC compiler/main/GHC.hs:(466,1)-(475,37)
1.1 0.8
regLiveness AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 0.5 2.0
RegAlloc-linear AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 0.5 2.4
layoutStack CmmPipeline compiler/cmm/CmmPipeline.hs:(97,13)-(99,40)
0.5 1.0
CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67
0.5 1.0
}}}
100 fields:
{{{
Wed Oct 25 15:00 2017 Time and Allocation Profiling Report
(Final)
ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well-
typed/devel/ghc/inplace/lib -B/home/tobias/well-
typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-100-read.hs
total time = 3.10 secs (3103 ticks @ 1000 us, 1
processor)
total alloc = 2,397,379,664 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC
%time %alloc
SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74
38.9 36.6
sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)
16.0 7.3
FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66
7.9 10.8
OccAnal SimplCore
compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 4.3 5.8
pprNativeCode AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 4.1 5.3
RegAlloc-linear AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 3.8 4.6
StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62)
2.8 2.5
NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63
2.8 4.8
regLiveness AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 1.8 1.9
CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67
1.8 2.9
genMachCode AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 1.8 1.6
tc_rn_src_decls TcRnDriver
compiler/typecheck/TcRnDriver.hs:(493,4)-(555,7) 1.3 1.3
NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78
1.0 1.7
CommonSubExpr SimplCore compiler/simplCore/SimplCore.hs:462:40-56
0.8 1.2
deSugar HscMain compiler/main/HscMain.hs:511:7-44
0.8 1.3
}}}
400 fields:
{{{
Thu Oct 26 01:25 2017 Time and Allocation Profiling Report
(Final)
ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well-
typed/devel/ghc/inplace/lib -B/home/tobias/well-
typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-400-read.hs
total time = 128.62 secs (128624 ticks @ 1000 us, 1
processor)
total alloc = 37,659,351,320 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC
%time %alloc
sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)
77.8 23.6
SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74
9.6 30.0
FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66
3.0 10.4
RegAlloc-linear AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 1.6 5.0
OccAnal SimplCore
compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 1.1 4.2
pprNativeCode AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 1.0 4.7
NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63
0.9 4.2
CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67
0.7 2.9
StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62)
0.6 1.9
regLiveness AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 0.5 1.4
genMachCode AsmCodeGen
compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 0.4 1.3
NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78
0.4 1.6
CommonSubExpr SimplCore compiler/simplCore/SimplCore.hs:462:40-56
0.2 1.1
deSugar HscMain compiler/main/HscMain.hs:511:7-44
0.2 1.1
}}}
> (Related question: does `sink` show up a a big item when compiling
"normal" programs?)
It does, but nowhere near this dominantly. E.g. getline, 400 fields:
{{{
Thu Oct 26 02:54 2017 Time and Allocation Profiling Report
(Final)
ghc-stage2 +RTS -p -h -RTS -B/home/tobias/well-
typed/devel/ghc/inplace/lib -B/home/tobias/well-
typed/devel/ghc/inplace/lib -O -fforce-recomp -c examples/t-400-getline.hs
total time = 14.13 secs (14126 ticks @ 1000 us, 1
processor)
total alloc = 12,479,030,504 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC
%time %alloc
sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59)
31.7 3.3
SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74
24.7 39.0
OccAnal SimplCore
compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 7.2 9.7
NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63
6.5 10.4
CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67
5.2 7.6
FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66
4.1 6.2
StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62)
2.0 1.6
deSugar HscMain compiler/main/HscMain.hs:511:7-44
2.0 3.3
CommonSubExpr SimplCore compiler/simplCore/SimplCore.hs:462:40-56
1.9 3.2
CorePrep HscMain compiler/main/HscMain.hs:(1313,24)-(1314,57)
1.4 2.5
Specialise SimplCore compiler/simplCore/SimplCore.hs:486:40-50
0.8 1.3
StgCse SimplStg compiler/simplStg/SimplStg.hs:(88,14)-(91,44)
0.5 1.3
}}}
> Guessing: the "shape" of the C-- code makes `sink` behave very badly.
Can you characterise what that shape is, and where the algorithmic badness
comes from?
Not yet, but I have a hunch that it's simply a matter of producing too
much of it. Maybe there is quadratic behavior there, where the number of
lines of C-- grows proportionally with the square of the number of fields?
That seems like a plausible explanation so far.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/7258#comment:58>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list