[GHC] #11565: Restore code to handle '-fmax-worker-args' flag

GHC ghc-devs at haskell.org
Mon Aug 29 13:35:06 UTC 2016


#11565: Restore code to handle '-fmax-worker-args' flag
-------------------------------------+-------------------------------------
        Reporter:  slyfox            |                Owner:
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.10.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by slyfox):

 Current motivating example to fix it is DynFlags example itself.
 I was profiling perf build of GHC and noticed a function that pushes
 whole DynFlags from stack to heap. This small piece of code emits
 10 pages of mov instructions.

 https://git.haskell.org/ghc.git/blob/HEAD:/compiler/nativeGen/AsmCodeGen.hs#l1109

 {{{#!hs
 1086 cmmExprNative :: ReferenceKind -> CmmExpr -> CmmOptM CmmExpr
 1087 cmmExprNative referenceKind expr = do
 1088      dflags <- getDynFlags
 1089      let platform = targetPlatform dflags
 1090          arch = platformArch platform
 1091      case expr of
 ...
 1106         CmmLit (CmmLabel lbl)
 1107            -> do
 1108                 cmmMakeDynamicReference dflags referenceKind lbl
 ...
 }}}

 {{{
        │      cmmExprNative :: ReferenceKind -> CmmExpr -> CmmOptM CmmExpr
        │      cmmExprNative referenceKind expr = do
   0,11 │        cmp    $0x3,%rax
        │      ↑ jb     3ceb930 <cFO7_info+0x8b0>
        │                 -- we must convert block Ids to CLabels here,
 because we
        │                 -- might have to do the PIC transformation.
 Hence we must
        │                 -- not modify BlockIds beyond this point.
        │
        │              CmmLit (CmmLabel lbl)
        │                 -> do
   2,02 │        add    $0x890,%r12
        │        cmp    0x358(%r13),%r12
        │      ↑ ja     3cf456f <cFIc_info+0x7df>
   0,16 │        mov    0x7(%rbx),%rax
   0,59 │        lea    ghc_DynFlags_DynFlags_con_info,%rbx
   0,05 │        mov    %rbx,-0x888(%r12)
   3,41 │18e9:   mov    0x50(%rsp),%rbx
   0,05 │        mov    %rbx,-0x880(%r12)
   0,32 │        mov    0x58(%rsp),%r14
        │        mov    %r14,-0x878(%r12)
        │        mov    0x60(%rsp),%rbx
        │        mov    %rbx,-0x870(%r12)
   0,05 │        mov    0x68(%rsp),%r14
        │        mov    %r14,-0x868(%r12)
        │        mov    0x70(%rsp),%rbx
        │        mov    %rbx,-0x860(%r12)
        │        mov    0x78(%rsp),%r14
   0,11 │        mov    %r14,-0x858(%r12)
   0,05 │        mov    0x80(%rsp),%rbx
        │        mov    %rbx,-0x850(%r12)
   0,05 │        mov    0x88(%rsp),%r14
        │        mov    %r14,-0x848(%r12)
        │        mov    0x90(%rsp),%rbx
        │        mov    %rbx,-0x840(%r12)
   0,05 │        mov    0x98(%rsp),%r14
   0,05 │        mov    %r14,-0x838(%r12)
   0,11 │        mov    0xa0(%rsp),%rbx
        │        mov    %rbx,-0x830(%r12)
        │        mov    0xa8(%rsp),%r14
        │        mov    %r14,-0x828(%r12)
   0,05 │        mov    0xb0(%rsp),%rbx
        │        mov    %rbx,-0x820(%r12)
        │        mov    0xb8(%rsp),%r14
 ... <a few more pages of it>
 }}}

 On x86_64 register mapping is: '''%r12%''' - heap, '''%rsp''' - machine
 SP.

 The suspiction is worker/wrapper optimisation that moves huge 140-field
 record
 '''DynFlags''' from heap to stack even its not mutated.


 Looking at the AsmCodeGen.hs with -ddump-worker-wrapper
 {{{
 "inplace/bin/ghc-stage1" -hisuf hi -osuf  o -hcsuf hc -static  -O -H64m -g
 -Wall      -this-unit-id ghc-8.1 -hide-all-packages -i
 -icompiler/basicTypes -icompiler/cmm -icompiler/codeGen -icompiler/coreSyn
 -icompiler/deSugar -icompiler/ghci -icompiler/hsSyn -icompiler/iface
 -icompiler/llvmGen -icompiler/main -icompiler/nativeGen -icompiler/parser
 -icompiler/prelude -icompiler/profiling -icompiler/rename
 -icompiler/simplCore -icompiler/simplStg -icompiler/specialise
 -icompiler/stgSyn -icompiler/stranal -icompiler/typecheck -icompiler/types
 -icompiler/utils -icompiler/vectorise -icompiler/stage2/build
 -Icompiler/stage2/build -icompiler/stage2/build/./autogen
 -Icompiler/stage2/build/./autogen -Icompiler/. -Icompiler/parser
 -Icompiler/utils -Icompiler/../rts/dist/build -Icompiler/stage2   -optP-
 DGHCI -optP-include -optPcompiler/stage2/build/./autogen/cabal_macros.h
 -package-id array-0.5.1.1 -package-id base-4.9.0.0 -package-id
 binary-0.8.3.0 -package-id bytestring-0.10.8.1 -package-id
 containers-0.5.7.1 -package-id deepseq-1.4.2.0 -package-id
 directory-1.2.6.2 -package-id filepath-1.4.1.0 -package-id ghc-boot-8.1
 -package-id ghci-8.1 -package-id hoopl-3.10.2.1 -package-id hpc-0.6.0.3
 -package-id process-1.4.2.0 -package-id template-haskell-2.11.0.0
 -package-id time-1.6.0.1 -package-id transformers-0.5.2.0 -package-id
 unix-2.7.2.0 -Wall -fno-warn-name-shadowing -this-unit-id ghc
 -XHaskell2010 -optc-DTHREADED_RTS -DGHCI_TABLES_NEXT_TO_CODE -DSTAGE=2
 -Rghc-timing -O2  -no-user-package-db -rtsopts      -Wnoncanonical-monad-
 instances  -odir compiler/stage2/build -hidir compiler/stage2/build
 -stubdir compiler/stage2/build   -dynamic-too -c
 compiler/nativeGen/AsmCodeGen.hs -o compiler/stage2/build/AsmCodeGen.o
 -dyno compiler/stage2/build/AsmCodeGen.dyn_o -ddump-worker-wrapper
 }}}

 there is a few places with functions with huge arity-140.
 One of the first places picked at random: '''dumpIfSet_dyn'''
 accepts a lot of separate arguments.

 {{{
        case dflags_ab5I of
        { DynFlags ww1_al11 ww2_al12 ww3_al13 ww4_al14 ww5_al15
                   ww6_al16 [Dmd=<L,U(U)>] ww7_al17 ww8_al18 ww9_al19
 ww10_al1a
                   ww11_al1b ww12_al1c ww13_al1d ww14_al1e ww15_al1f
 ww16_al1g
                   ww17_al1h ww18_al1i ww19_al1j ww20_al1k ww21_al1l
 ww22_al1m
                   ww23_al1n ww24_al1o ww25_al1p ww26_al1q ww27_al1r
 ww28_al1s
                   ww29_al1t ww30_al1u ww31_al1v ww32_al1w ww33_al1x
 ww34_al1y
                   ww35_al1z ww36_al1A ww37_al1B ww38_al1C ww39_al1D
 ww40_al1E
                   ww41_al1F ww42_al1G ww43_al1H ww44_al1I ww45_al1J
 ww46_al1K
                   ww47_al1L ww48_al1M ww49_al1N ww50_al1O ww51_al1P
 ww52_al1Q
                   ww53_al1R ww54_al1S ww55_al1T ww56_al1U ww57_al1V
 ww58_al1W
                   ww59_al1X ww60_al1Y ww61_al1Z ww62_al20 ww63_al21
 ww64_al22
                   ww65_al23 ww66_al24 ww67_al25 ww68_al26 ww69_al27
 ww70_al28
                   ww71_al29 ww72_al2a ww73_al2b ww74_al2c ww75_al2d
 ww76_al2e
                   ww77_al2f ww78_al2g ww79_al2h ww80_al2i ww81_al2j
 ww82_al2k
                   ww83_al2l ww84_al2m [Dmd=<L,U(U)>] ww85_al2n [Dmd=<S,U>]
 ww86_al2o
                   ww87_al2p ww88_al2q ww89_al2r ww90_al2s ww91_al2t
 ww92_al2u
                   ww93_al2v ww94_al2w ww95_al2x ww96_al2y ww97_al2z
 ww98_al2A
                   ww99_al2B ww100_al2C ww101_al2D ww102_al2E ww103_al2F
 ww104_al2G
                   ww105_al2H ww106_al2I ww107_al2J ww108_al2K ww109_al2L
 ww110_al2M
                   ww111_al2N ww112_al2O ww113_al2P ww114_al2Q ww115_al2R
                   ww116_al2S [Dmd=<L,U(U)>] ww117_al2T ww118_al2U
 ww119_al2V
                   ww120_al2W ww121_al2X ww122_al2Y ww123_al2Z ww124_al30
 ww125_al31
                   ww126_al32 ww127_al33 ww128_al34 ww129_al35 ww130_al36
 ww131_al37
                   ww132_al38 ww133_al39 ww134_al3a ww135_al3b ww136_al3c
 ->
        ErrUtils.$wdumpIfSet_dyn
          ww1_al11
          ww2_al12
          ww3_al13
          ww4_al14
          ww5_al15
          ww6_al16
          ww7_al17
          ww8_al18
          ww9_al19
          ww10_al1a
          ww11_al1b
          ww12_al1c
          ww13_al1d
          ww14_al1e
          ww15_al1f
          ww16_al1g
          ww17_al1h
          ww18_al1i
          ww19_al1j
          ww20_al1k
 ...
 }}}

 I'll try to craft small example that demonstrates the blowup.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/11565#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list