[GHC] #9221: (super!) linear slowdown of parallel builds on 40 core machine

Sun Dec 31 03:07:54 UTC 2017

#9221: (super!) linear slowdown of parallel builds on 40 core machine
-------------------------------------+-------------------------------------
        Reporter:  carter            |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.4.1
       Component:  Compiler          |              Version:  7.8.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #910, #8224       |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by nh2):

 Hey, a question about `sched_yield()`:

 I just read the [http://man7.org/linux/man-pages/man2/sched_yield.2.html
 man page] for `sched_yield()` again. It says:

 > `sched_yield()` is intended for use with read-time scheduling policies
 (i.e., `SCHED_FIFO` or `SCHED_RR`).  Use of `sched_yield()` with
 nondeterministic scheduling policies such as `SCHED_OTHER` is unspecified
 and very likely means your application design is broken.

 Does GHC set the `FIFO` or`RR` policy? If not, then according to that our
 "application design is broken".

 I also found some interesting info on
 http://www.informit.com/articles/article.aspx?p=101760&seqNum=5 (emphasis
 mine):

 > Linux provides the `sched_yield()` system call as a mechanism for a
 process to explicitly yield the processor to other waiting processes. It
 works by removing the process from the active array (where it currently
 is, because it is running) and inserting it into the expired array. This
 has the effect of not only preempting the process and putting it at the
 end of its priority list, but putting it on the expired list —
 **guaranteeing it will not run for a while**. Because real-time tasks
 never expire, they are a special case. Therefore, they are merely moved to
 the end of their priority list (and not inserted into the expired array).
 **In earlier versions of Linux, the semantics of the `sched_yield()` call
 were quite different; at best, the task was only moved to the end of their
 priority list**. The yielding was often not for a very long time.
 Nowadays, applications and even kernel code should be certain they truly
 want to give up the processor before calling `sched_yield()`.

 A similar article on LWN: https://lwn.net/Articles/31462/

 > This call used to simply move the process to the end of the run queue;
 now it moves the process to the "expired" queue, effectively cancelling
 the rest of the process's time slice. So a process calling `sched_yield()`
 now must wait until all other runnable processes in the system have used
 up their time slices before it will get the processor again.

 The article goes on to explain that this resulted bad performance
 especially for

 > threaded applications [that] implement busy-wait loops with
 `sched_yield()`

 Might this be relevant here?

 Also, can someone explain me why GHC is using `sched_yield()` at all? If
 the purpose is to wait until other GC threads are done, wouldn't `futex()`
 be enough? Or is that what's explained in
 https://ghcmutterings.wordpress.com/2010/01/25/yielding-more-improvements-
 in-parallel-performance/ ?

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9221#comment:84>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler