[GHC] #11565: Restore code to handle '-fmax-worker-args' flag
GHC
ghc-devs at haskell.org
Mon Aug 29 13:35:06 UTC 2016
#11565: Restore code to handle '-fmax-worker-args' flag
-------------------------------------+-------------------------------------
Reporter: slyfox | Owner:
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.10.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by slyfox):
Current motivating example to fix it is DynFlags example itself.
I was profiling perf build of GHC and noticed a function that pushes
whole DynFlags from stack to heap. This small piece of code emits
10 pages of mov instructions.
https://git.haskell.org/ghc.git/blob/HEAD:/compiler/nativeGen/AsmCodeGen.hs#l1109
{{{#!hs
1086 cmmExprNative :: ReferenceKind -> CmmExpr -> CmmOptM CmmExpr
1087 cmmExprNative referenceKind expr = do
1088 dflags <- getDynFlags
1089 let platform = targetPlatform dflags
1090 arch = platformArch platform
1091 case expr of
...
1106 CmmLit (CmmLabel lbl)
1107 -> do
1108 cmmMakeDynamicReference dflags referenceKind lbl
...
}}}
{{{
│ cmmExprNative :: ReferenceKind -> CmmExpr -> CmmOptM CmmExpr
│ cmmExprNative referenceKind expr = do
0,11 │ cmp $0x3,%rax
│ ↑ jb 3ceb930 <cFO7_info+0x8b0>
│ -- we must convert block Ids to CLabels here,
because we
│ -- might have to do the PIC transformation.
Hence we must
│ -- not modify BlockIds beyond this point.
│
│ CmmLit (CmmLabel lbl)
│ -> do
2,02 │ add $0x890,%r12
│ cmp 0x358(%r13),%r12
│ ↑ ja 3cf456f <cFIc_info+0x7df>
0,16 │ mov 0x7(%rbx),%rax
0,59 │ lea ghc_DynFlags_DynFlags_con_info,%rbx
0,05 │ mov %rbx,-0x888(%r12)
3,41 │18e9: mov 0x50(%rsp),%rbx
0,05 │ mov %rbx,-0x880(%r12)
0,32 │ mov 0x58(%rsp),%r14
│ mov %r14,-0x878(%r12)
│ mov 0x60(%rsp),%rbx
│ mov %rbx,-0x870(%r12)
0,05 │ mov 0x68(%rsp),%r14
│ mov %r14,-0x868(%r12)
│ mov 0x70(%rsp),%rbx
│ mov %rbx,-0x860(%r12)
│ mov 0x78(%rsp),%r14
0,11 │ mov %r14,-0x858(%r12)
0,05 │ mov 0x80(%rsp),%rbx
│ mov %rbx,-0x850(%r12)
0,05 │ mov 0x88(%rsp),%r14
│ mov %r14,-0x848(%r12)
│ mov 0x90(%rsp),%rbx
│ mov %rbx,-0x840(%r12)
0,05 │ mov 0x98(%rsp),%r14
0,05 │ mov %r14,-0x838(%r12)
0,11 │ mov 0xa0(%rsp),%rbx
│ mov %rbx,-0x830(%r12)
│ mov 0xa8(%rsp),%r14
│ mov %r14,-0x828(%r12)
0,05 │ mov 0xb0(%rsp),%rbx
│ mov %rbx,-0x820(%r12)
│ mov 0xb8(%rsp),%r14
... <a few more pages of it>
}}}
On x86_64 register mapping is: '''%r12%''' - heap, '''%rsp''' - machine
SP.
The suspiction is worker/wrapper optimisation that moves huge 140-field
record
'''DynFlags''' from heap to stack even its not mutated.
Looking at the AsmCodeGen.hs with -ddump-worker-wrapper
{{{
"inplace/bin/ghc-stage1" -hisuf hi -osuf o -hcsuf hc -static -O -H64m -g
-Wall -this-unit-id ghc-8.1 -hide-all-packages -i
-icompiler/basicTypes -icompiler/cmm -icompiler/codeGen -icompiler/coreSyn
-icompiler/deSugar -icompiler/ghci -icompiler/hsSyn -icompiler/iface
-icompiler/llvmGen -icompiler/main -icompiler/nativeGen -icompiler/parser
-icompiler/prelude -icompiler/profiling -icompiler/rename
-icompiler/simplCore -icompiler/simplStg -icompiler/specialise
-icompiler/stgSyn -icompiler/stranal -icompiler/typecheck -icompiler/types
-icompiler/utils -icompiler/vectorise -icompiler/stage2/build
-Icompiler/stage2/build -icompiler/stage2/build/./autogen
-Icompiler/stage2/build/./autogen -Icompiler/. -Icompiler/parser
-Icompiler/utils -Icompiler/../rts/dist/build -Icompiler/stage2 -optP-
DGHCI -optP-include -optPcompiler/stage2/build/./autogen/cabal_macros.h
-package-id array-0.5.1.1 -package-id base-4.9.0.0 -package-id
binary-0.8.3.0 -package-id bytestring-0.10.8.1 -package-id
containers-0.5.7.1 -package-id deepseq-1.4.2.0 -package-id
directory-1.2.6.2 -package-id filepath-1.4.1.0 -package-id ghc-boot-8.1
-package-id ghci-8.1 -package-id hoopl-3.10.2.1 -package-id hpc-0.6.0.3
-package-id process-1.4.2.0 -package-id template-haskell-2.11.0.0
-package-id time-1.6.0.1 -package-id transformers-0.5.2.0 -package-id
unix-2.7.2.0 -Wall -fno-warn-name-shadowing -this-unit-id ghc
-XHaskell2010 -optc-DTHREADED_RTS -DGHCI_TABLES_NEXT_TO_CODE -DSTAGE=2
-Rghc-timing -O2 -no-user-package-db -rtsopts -Wnoncanonical-monad-
instances -odir compiler/stage2/build -hidir compiler/stage2/build
-stubdir compiler/stage2/build -dynamic-too -c
compiler/nativeGen/AsmCodeGen.hs -o compiler/stage2/build/AsmCodeGen.o
-dyno compiler/stage2/build/AsmCodeGen.dyn_o -ddump-worker-wrapper
}}}
there is a few places with functions with huge arity-140.
One of the first places picked at random: '''dumpIfSet_dyn'''
accepts a lot of separate arguments.
{{{
case dflags_ab5I of
{ DynFlags ww1_al11 ww2_al12 ww3_al13 ww4_al14 ww5_al15
ww6_al16 [Dmd=<L,U(U)>] ww7_al17 ww8_al18 ww9_al19
ww10_al1a
ww11_al1b ww12_al1c ww13_al1d ww14_al1e ww15_al1f
ww16_al1g
ww17_al1h ww18_al1i ww19_al1j ww20_al1k ww21_al1l
ww22_al1m
ww23_al1n ww24_al1o ww25_al1p ww26_al1q ww27_al1r
ww28_al1s
ww29_al1t ww30_al1u ww31_al1v ww32_al1w ww33_al1x
ww34_al1y
ww35_al1z ww36_al1A ww37_al1B ww38_al1C ww39_al1D
ww40_al1E
ww41_al1F ww42_al1G ww43_al1H ww44_al1I ww45_al1J
ww46_al1K
ww47_al1L ww48_al1M ww49_al1N ww50_al1O ww51_al1P
ww52_al1Q
ww53_al1R ww54_al1S ww55_al1T ww56_al1U ww57_al1V
ww58_al1W
ww59_al1X ww60_al1Y ww61_al1Z ww62_al20 ww63_al21
ww64_al22
ww65_al23 ww66_al24 ww67_al25 ww68_al26 ww69_al27
ww70_al28
ww71_al29 ww72_al2a ww73_al2b ww74_al2c ww75_al2d
ww76_al2e
ww77_al2f ww78_al2g ww79_al2h ww80_al2i ww81_al2j
ww82_al2k
ww83_al2l ww84_al2m [Dmd=<L,U(U)>] ww85_al2n [Dmd=<S,U>]
ww86_al2o
ww87_al2p ww88_al2q ww89_al2r ww90_al2s ww91_al2t
ww92_al2u
ww93_al2v ww94_al2w ww95_al2x ww96_al2y ww97_al2z
ww98_al2A
ww99_al2B ww100_al2C ww101_al2D ww102_al2E ww103_al2F
ww104_al2G
ww105_al2H ww106_al2I ww107_al2J ww108_al2K ww109_al2L
ww110_al2M
ww111_al2N ww112_al2O ww113_al2P ww114_al2Q ww115_al2R
ww116_al2S [Dmd=<L,U(U)>] ww117_al2T ww118_al2U
ww119_al2V
ww120_al2W ww121_al2X ww122_al2Y ww123_al2Z ww124_al30
ww125_al31
ww126_al32 ww127_al33 ww128_al34 ww129_al35 ww130_al36
ww131_al37
ww132_al38 ww133_al39 ww134_al3a ww135_al3b ww136_al3c
->
ErrUtils.$wdumpIfSet_dyn
ww1_al11
ww2_al12
ww3_al13
ww4_al14
ww5_al15
ww6_al16
ww7_al17
ww8_al18
ww9_al19
ww10_al1a
ww11_al1b
ww12_al1c
ww13_al1d
ww14_al1e
ww15_al1f
ww16_al1g
ww17_al1h
ww18_al1i
ww19_al1j
ww20_al1k
...
}}}
I'll try to craft small example that demonstrates the blowup.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/11565#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list