[GHC] #14407: rts: Threads/caps affinity
GHC
ghc-devs at haskell.org
Tue Oct 31 01:29:12 UTC 2017
#14407: rts: Threads/caps affinity
-------------------------------------+-------------------------------------
Reporter: pacak | Owner: (none)
Type: feature | Status: new
request |
Priority: normal | Milestone:
Component: Runtime | Version: 8.3
System |
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: Runtime
Unknown/Multiple | performance bug
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
Currently GHC supports two kinds of threads with respect to thread
migration - pinned to a specific capability and those it can migrate
between any capabilities. For purposes of achieving lower latency in
Haskell applications it would be nice to have something in between -
threads GHC rts can migrate but within a certain subset of capabilities
only.
I'm developing a program that contains several kinds of threads - those
that do little work and sensitive to latency and those that can spend more
CPU time and less latency sensitive. I looked into several cases of
increased latency in those sensitive threads (using GHC eventlog) and in
all cases sensitive threads were waiting for non-sensitive threads to
finish working. I was able to reduce worst case latency by factor of 10 by
pinning all the threads in the program to specific capability but manually
distributing threads (60+ of them) between capabilities (several different
machines with different numbers of cores available) seems very fragile.
World stopping GC is still a problem but at least in my case is much less
frequently so.
I have a patch for rts that implements this proposal
{{{#!hs
{- | 'setThreadAffinity' limits RTS ability to migrate thread to
capabilities with numbers that matches set bits of affinity mask, thus
mask of `0b101` (5) will allow RTS to migrate this thread to caps
0 (64, 128, ..) and 3 (64 + 3 = 67, 128 + 3 = 131, ...).
Setting all bits to 0 or 1 will disable the restriction.
-}
setThreadAffinity :: ThreadId -> Int -> IO ()
}}}
This allows to define up to 64 distinct groups and allow user to break
down their threads into bigger number of potentially intersecting groups
by specifying things like capability 0 does latency sensitive things, caps
1..5 - less sensitive things, caps 5-7 bulk things.
Sample program using this API
{{{#!hs
{-# LANGUAGE LambdaCase #-}
import Data.Time
import Control.Monad
import Control.Concurrent
import System.Environment (getArgs)
import GHC.Conc
wastetime :: Bool -> IO ()
wastetime affine = do
tid <- forkIO $ do
myThreadId >>= \tid -> labelThread tid "timewaster"
forever $ do
when (sum [1..1000000] < (0 :: Integer)) $
print "impossible"
threadDelay 100
yield
when affine $ setThreadAffinity tid (255 - 2)
client :: Bool -> IO ()
client affine = do
myThreadId >>= \tid -> labelThread tid "client"
when affine $ myThreadId >>= \tid -> setThreadAffinity tid 2
before <- getCurrentTime
replicateM_ 10 $ do
threadDelay 10000
after <- getCurrentTime
print $ after `diffUTCTime` before
startClient :: Bool -> IO ()
startClient = {- replicateM_ 10 . -} client
main :: IO ()
main = do
getArgs >>= \case
[wno's, aff's] -> do
let wno = read wno's
aff = read aff's
putStrLn $ unwords ["Affinity:", show aff, "Timewasters:",
show wno]
replicateM_ wno (wastetime aff)
startClient aff
_ -> putStrLn "Usage: <progname> <number of time wasters> <enable
affinity: True/False>"
}}}
Compiled with -threaded and running with rts -N8 on 6 core (12 threads)
machine.
Results are noisy but repeatable
{{{
Affinity: False Timewasters: 24
0.42482036s
}}}
{{{
Affinity: True Timewasters: 24
0.111743474s
}}}
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14407>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list