[GHC] #10229: setThreadAffinity assumes a certain CPU virtual core layout
GHC
ghc-devs at haskell.org
Thu Apr 2 17:05:11 UTC 2015
#10229: setThreadAffinity assumes a certain CPU virtual core layout
-------------------------------------+-------------------------------------
Reporter: nh2 | Owner: simonmar
Type: bug | Status: new
Priority: normal | Milestone:
Component: Runtime | Version: 7.10.1
System | Operating System: Unknown/Multiple
Keywords: | Type of failure: Runtime
Architecture: | performance bug
Unknown/Multiple | Blocked By:
Test Case: | Related Tickets:
Blocking: |
Differential Revisions: |
-------------------------------------+-------------------------------------
The {{{RTS -qa}}} option that can set thread affinity was implemented in
https://git.haskell.org/ghc.git/commitdiff/31caec794c3978d55d79f715f21fb72948c9f300
{{{
// Schedules the thread to run on CPU n of m. m may be less than the
// number of physical CPUs, in which case, the thread will be allowed
// to run on CPU n, n+m, n+2m etc.
void
setThreadAffinity (nat n, nat m)
}}}
Today I discovered that on some machines, this option helps parallel
performance (e.g. {{{+RTS -N4}}}) a lot, while on others it doesn't.
Together with thomie on #ghc, I found out the reason:
Lets assume I have 4 real cores with hyperthreading, so 8 virtual cores.
The mapping of hyperthreading cores to physical cores is different across
machines.
On my one machine (Intel i5), the layout is 11223344, meaning that the
first two vCPUs (hyperthreads) that the OS announces (visible e.g. in
HTOP) map to the first physical core in the system, and so on.
On my other machine (Intel Xeon), the layout is 12341234; here the 1st and
the 5th vCPU map to the same physical core.
This layout can be (on Linux) observed by running:
{{{
cat /proc/cpuinfo|egrep "processor|physical id|core id" |sed
's/^processor/\nprocessor/g'
}}}
I do not know whether this layout is dictated by the processor, chosen by
the OS, or even changing across reboots; what is clear is that the layout
can vary across machines.
Now, as explained by thomie:
{{{
-qa will set your 4 capabilities to cores [(1,5), (2,6), (3,7), (4,8)],
and then the os randomly chooses out of those tuples
}}}
This strategy is optimal for the 12341234 layout; for example, when
running with -N4, it ensures that two threads are not scheduled onto vCPUs
that are on the same physical core. The possible {{{+RTS -aq}}} choice
{{{1__4_23_}}} is a great assignment in this case, as is {{{1234____}}}
({{{_}}} means the vCPU is not chosen).
But for the 11223344, the choice {{{1234____}}} isn't good, because it
uses only 2 of our 4 physical cores; our program now takes twice as long
to run.
----
It seems likely to me that {{{setThreadAffinity}}} was written on a
machine with 12341234 layout, and with the assumption that all machines
have this layout.
It would be great if we could change it to take the actual layout into
account.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10229>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list