[GHC] #10229: setThreadAffinity assumes a certain CPU virtual core layout

Thu Apr 2 17:05:11 UTC 2015

#10229: setThreadAffinity assumes a certain CPU virtual core layout
-------------------------------------+-------------------------------------
              Reporter:  nh2         |             Owner:  simonmar
                  Type:  bug         |            Status:  new
              Priority:  normal      |         Milestone:
             Component:  Runtime     |           Version:  7.10.1
  System                             |  Operating System:  Unknown/Multiple
              Keywords:              |   Type of failure:  Runtime
          Architecture:              |  performance bug
  Unknown/Multiple                   |        Blocked By:
             Test Case:              |   Related Tickets:
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------
 The {{{RTS -qa}}} option that can set thread affinity was implemented in
 https://git.haskell.org/ghc.git/commitdiff/31caec794c3978d55d79f715f21fb72948c9f300

 {{{
 // Schedules the thread to run on CPU n of m.  m may be less than the
 // number of physical CPUs, in which case, the thread will be allowed
 // to run on CPU n, n+m, n+2m etc.
 void
 setThreadAffinity (nat n, nat m)
 }}}

 Today I discovered that on some machines, this option helps parallel
 performance (e.g. {{{+RTS -N4}}}) a lot, while on others it doesn't.

 Together with thomie on #ghc, I found out the reason:

 Lets assume I have 4 real cores with hyperthreading, so 8 virtual cores.

 The mapping of hyperthreading cores to physical cores is different across
 machines.

 On my one machine (Intel i5), the layout is 11223344, meaning that the
 first two vCPUs (hyperthreads) that the OS announces (visible e.g. in
 HTOP) map to the first physical core in the system, and so on.

 On my other machine (Intel Xeon), the layout is 12341234; here the 1st and
 the 5th vCPU map to the same physical core.

 This layout can be (on Linux) observed by running:

 {{{
 cat /proc/cpuinfo|egrep "processor|physical id|core id" |sed
 's/^processor/\nprocessor/g'
 }}}

 I do not know whether this layout is dictated by the processor, chosen by
 the OS, or even changing across reboots; what is clear is that the layout
 can vary across machines.

 Now, as explained by thomie:

 {{{
 -qa will set your 4 capabilities to cores [(1,5), (2,6), (3,7), (4,8)],
 and then the os randomly chooses out of those tuples
 }}}

 This strategy is optimal for the 12341234 layout; for example, when
 running with -N4, it ensures that two threads are not scheduled onto vCPUs
 that are on the same physical core. The possible {{{+RTS -aq}}} choice
 {{{1__4_23_}}} is a great assignment in this case, as is {{{1234____}}}
 ({{{_}}} means the vCPU is not chosen).

 But for the 11223344, the choice {{{1234____}}} isn't good, because it
 uses only 2 of our 4 physical cores; our program now takes twice as long
 to run.

 ----

 It seems likely to me that {{{setThreadAffinity}}} was written on a
 machine with 12341234 layout, and with the assumption that all machines
 have this layout.

 It would be great if we could change it to take the actual layout into
 account.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10229>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler