[GHC] #9221: (super!) linear slowdown of parallel builds on 40 core machine
GHC
ghc-devs at haskell.org
Wed Aug 31 16:51:07 UTC 2016
#9221: (super!) linear slowdown of parallel builds on 40 core machine
-------------------------------------+-------------------------------------
Reporter: carter | Owner:
Type: bug | Status: new
Priority: normal | Milestone: 8.2.1
Component: Compiler | Version: 7.8.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #910, #8224 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by slyfox):
24-core VM.
CPU topology:
{{{
$ lstopo-no-graphics
Machine (118GB)
Package L#0 + L3 L#0 (30MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#1)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#2 (P#2)
PU L#3 (P#3)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
PU L#4 (P#4)
PU L#5 (P#5)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
PU L#6 (P#6)
PU L#7 (P#7)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
PU L#8 (P#8)
PU L#9 (P#9)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
PU L#10 (P#10)
PU L#11 (P#11)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
PU L#12 (P#12)
PU L#13 (P#13)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
PU L#14 (P#14)
PU L#15 (P#15)
L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
PU L#16 (P#16)
PU L#17 (P#17)
L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
PU L#18 (P#18)
PU L#19 (P#19)
L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
PU L#20 (P#20)
PU L#21 (P#21)
L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
PU L#22 (P#22)
PU L#23 (P#23)
$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
node 0 size: 120881 MB
node 0 free: 120192 MB
node distances:
node 0
0: 10
}}}
(I would not trust numactl output).
Separate processes:
{{{
$ make clean; time make -j1
real 1m33.147s
user 1m20.836s
sys 0m11.556s
$ make clean; time make -j10
real 0m11.275s
user 1m29.800s
sys 0m12.856s
$ make clean; time make -j12
real 0m10.537s
user 1m36.276s
sys 0m16.948s
$ make clean; time make -j14
real 0m9.117s
user 1m39.132s
sys 0m18.332s
$ make clean; time make -j20
real 0m8.498s
user 2m7.064s
sys 0m17.912s
$ make clean; time make -j22
real 0m7.468s
user 2m9.808s
sys 0m18.592s
$ make clean; time make -j24
real 0m7.336s
user 2m15.936s
sys 0m19.004s
$ make clean; time make -j26
real 0m7.433s
user 2m17.612s
sys 0m19.648s
$ make clean; time make -j28
real 0m7.554s
user 2m17.760s
sys 0m19.564s
$ make clean; time make -j30
real 0m7.563s
user 2m16.776s
sys 0m21.104s
}}}
Numbers are jumping slightly from run to run but the gist is best
performance is around -j24, not -j12.
Single process:
{{{
$ ./synth.bash -j1 +RTS -sstderr -A256M -qb0 -RTS
real 1m15.214s
user 1m14.060s
sys 0m0.984s
$ ./synth.bash -j8 +RTS -sstderr -A256M -qb0 -RTS
real 0m11.275s
user 1m21.708s
sys 0m2.912s
$ ./synth.bash -j10 +RTS -sstderr -A256M -qb0 -RTS
real 0m10.279s
user 1m25.184s
sys 0m3.664s
$ ./synth.bash -j12 +RTS -sstderr -A256M -qb0 -RTS
real 0m9.605s
user 1m32.688s
sys 0m4.292s
$ ./synth.bash -j14 +RTS -sstderr -A256M -qb0 -RTS
real 0m9.144s
user 1m40.288s
sys 0m4.964s
$ ./synth.bash -j16 +RTS -sstderr -A256M -qb0 -RTS
real 0m10.003s
user 1m51.916s
sys 0m6.604s
$ ./synth.bash -j20 +RTS -sstderr -A256M -qb0 -RTS
real 0m10.215s
user 2m7.924s
sys 0m8.208s
$ ./synth.bash -j22 +RTS -sstderr -A256M -qb0 -RTS
real 0m10.483s
user 2m13.440s
sys 0m10.456s
$ ./synth.bash -j24 +RTS -sstderr -A256M -qb0 -RTS
real 0m10.985s
user 2m18.028s
sys 0m10.780s
$ ./synth.bash -j32 +RTS -sstderr -A256M -qb0 -RTS
real 0m12.636s
user 2m32.312s
sys 0m14.508s
}}}
Here we see best numbers around -j12 and those are worse than multiprocess
run.
From '''perf record''' it's not very clear what happens.
I'll try to get a 64-core VM next week and see if the effect will be
visible there much better.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9221#comment:66>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list