GHC Threads affinity

Mon Sep 11 12:54:07 UTC 2017

>> I'm developing a program that contains several kinds of threads - those
that do little work and sensitive to latency and those that can spend more
CPU time and less latency sensitive. I looked into several cases of
increased latency in those sensitive threads (using GHC eventlog) and in
all cases sensitive threads were waiting for non-sensitive threads to
finish working. I was able to reduce worst case latency by factor of 10 by
pinning all the threads in the program to specific capability but manually
distributing threads (60+ of them) between capabilities (several different
machines with different numbers of cores available) seems very fragile.
World stopping GC is still a problem but at least in my case is much less
frequently so.
>
> If you have a fixed set of threads you might just want to use -N<threads>
-qn<cores>, and then pin every thread to a different capability.  This
gives you 1:1 scheduling at the GHC level, delegating the scheduling job to
the OS.  You will also want to use nursery chunks with something like -n2m,
so you don't waste too much nursery space on the idle capabilities.
>
> Even if your set of threads isn't fixed you might be able to use a hybrid
scheme with -N<large> -qn<cores> and pin the high-priority threads on their
own capability, while putting all the low-priority threads on a single
capability, or a few separate ones.

There's about 80 threads right now and some of them are very short lived.
Most of them are low priority and require lots of CPU which means having to
manually distribute them over several capabilities - this process I'd like
to avoid.

>> It would be nice to be able to allow GHC runtime to migrate a thread
between a subset of capabilities using interface similar to this one:
>>
>> -- creates a thread that is allowed to migrate between capabilities
according to following rule: ghc is allowed to run this thread on Nth
capability if Nth `mod` size_of_word bit in mask is set.
>> forkOn' :: Int -> IO () -> IO ThreadId
>> forkOn' mask act = undefined
>>
>> This should allow to define up to 64 (32) distinct groups and allow user
to break down their threads into bigger number of potentially intersecting
groups by specifying things like capability 0 does latency sensitive
things, caps 1..5 - less  sensitive things, caps 6-7 bulk things.
>
>
> We could do this, but it would add some complexity to the scheduler and
load balancer (which has already been quite hard to get right, I fixed a
handful of bugs there recently). I'd be happy review a patch if you want to
try it though.

I guess I'll start by studying the scheduler and load balancer in more
details. Thank you for your input Simon!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170911/79831061/attachment.html>