[GHC] #9221: (super!) linear slowdown of parallel builds on 40 core machine

GHC ghc-devs at haskell.org
Sat Sep 13 23:13:15 UTC 2014


#9221: (super!) linear slowdown of parallel builds on 40 core machine
-------------------------------------+-------------------------------------
              Reporter:  carter      |            Owner:
                  Type:  bug         |           Status:  new
              Priority:  high        |        Milestone:  7.10.1
             Component:  Compiler    |          Version:  7.8.2
            Resolution:              |         Keywords:
      Operating System:              |     Architecture:  Unknown/Multiple
  Unknown/Multiple                   |       Difficulty:  Unknown
       Type of failure:  Compile-    |       Blocked By:
  time performance bug               |  Related Tickets:  #910
             Test Case:              |
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------

Comment (by gintas):

 I think I know what's going on here. If you look at parUpsweep in
 compiler/main/GhcMake.js, its argument n_jobs is used in two places: one
 is the initial value of the par_sem semaphore used to limit
 parallelization, and the other is a call to setNumCapabilities. The latter
 seems to be the cause of the slowdown.

 Note that setNumCapabilities is only invoked if the previous count of
 capabilities was 1. I used that to control for both settings
 independently, and it turns out that the runtime overhead is mostly
 independent of the semaphore value and highly influenced by capability
 count.

 I ran some experiments on a 16-CPU VM (picked a larger one deliberately to
 make the differences more pronounced). Running with jobs=4 & caps=4, a
 test took 37s walltime, jobs=4 & caps=16 took 51s, jobs=4 & caps=32 took
 114s (344s of MUT and 1021s of GC!). The figures are very similar for
 jobs=16 and jobs=64. See attached log for more details (-sstderr output).

 It looks like the runtime GC is just inefficient when running with many
 capabilities, even if many physical cores are available. I'll try a few
 experiments to verify that this is a general pattern that is not specific
 to the GhcMake implementation.

 Logic and a few experiments indicate that it does not help walltime to set
 the number of jobs (semaphore value) higher than the number of
 capabilities, so there's not much we can do about those two parameters in
 the parUpsweep implementation other than capping n_jobs at some constant
 (probably <= 8).

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9221#comment:9>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list