[GHC] #9221: (super!) linear slowdown of parallel builds on 40 core machine

Fri Jul 31 10:53:15 UTC 2015

#9221: (super!) linear slowdown of parallel builds on 40 core machine
-------------------------------------+-------------------------------------
        Reporter:  carter            |                   Owner:
            Type:  bug               |                  Status:  new
        Priority:  high              |               Milestone:  7.12.1
       Component:  Compiler          |                 Version:  7.8.2
      Resolution:                    |                Keywords:
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |               Test Case:
      Blocked By:                    |                Blocking:
 Related Tickets:  #910              |  Differential Revisions:
-------------------------------------+-------------------------------------

Comment (by nh2):

 Replying to [comment:23 ezyang]:
 > I was chatting with one of my colleagues about this problem recently,
 and they said something very provocative: if GHC is not scaling because
 there is some global mutable state (e.g. the NameCache) ...
 >
 > Do people agree with this viewpoint? Disagree?

 I disagree. Threads should almost always be more efficient to use as they
 allow to efficiently/easily share resources when it makes things faster,
 but that doesn't mean that we have to share all the things. Processes
 force us to not share anything. If building is faster with separate
 processes, then we should be able to achieve the same speed with threads
 by simply not sharing that thing that makes it slow and that processes
 force us to not share.

 However, I wouldn't be surprised if this isn't even the problem here.

 Replying to [comment:28 slyfox]:
 > If perf does not lie most of the time is spent cycling over sleeping
 kernel threads

 This sounds much more like the problem.

 If I had to make a guess (and based on the very limited look I had into
 this issue last year) it feels like we are accidentally busy polling
 something somewhere.

 When I run some non-build Haskell stuff with `-RTS +N18` on the current
 generation of 18 core AWS instances, with many more Haskell threads than
 needed for building a 200 module project, and with shorter thread life
 times than in this case (e.g. let's say building a module takes around 0.5
 seconds), that stuff scales pretty nicely, much better than ghc's `--make`
 scales here. This makes me think that we might be simply doing something
 wrong in the parallel upsweep code, and that the rest (compiler, runtime
 etc.) is doing quite OK.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9221#comment:30>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler