Extended periods of "waking up thread %d on cap %d"
Simon Marlow
marlowsd at gmail.com
Mon Jan 28 12:40:18 CET 2013
On 26/01/13 00:28, Ben Gamari wrote:
> Recently I've been benchmarking my concurrent Gibbs sampler[1] on a
> largish multicore machine. I've been using GHC HEAD due to various
> GC-related fixes that are present. One thing that I've noticed in
> looking over the event logs is that there are large durations (tens of
> milliseconds) when HECs appear to be constantly bombarded with wake-ups
> from other capabilities.
>
> In these periods, the eventlog will be filled with blocks of messages
> such as this snippet from one benchmark run[2] (long repeated blocks
> marked with ellipses, a few irrelevant messages have been omitted yet
> not marked, see eventlog for full log),
>
> 28.320958s HEC 7: running thread 293
> ...
> 28.391802s HEC 6: running thread 209
> 28.392070s HEC 6: waking up thread 284 on cap 7
> 28.392302s HEC 6: waking up thread 284 on cap 7
> 28.392579s HEC 6: waking up thread 284 on cap 7
> 28.392808s HEC 6: waking up thread 284 on cap 7
> .
> .
> .
> 28.405971s HEC 16: waking up thread 284 on cap 7
> 28.406026s HEC 15: waking up thread 284 on cap 7
> 28.406080s HEC 12: waking up thread 284 on cap 7
> 28.406091s HEC 23: waking up thread 284 on cap 7
> 28.406095s HEC 6: waking up thread 284 on cap 7
> 28.406095s HEC 23: waking up thread 284 on cap 7
> 28.406225s HEC 16: waking up thread 284 on cap 7
> .
> .
> .
> 28.490449s HEC 7: stopping thread 293 (thread yielding)
> 28.490454s HEC 7: waking up thread 284 on cap 7
> 28.490454s HEC 7: waking up thread 284 on cap 7
> 28.490455s HEC 7: waking up thread 284 on cap 7
> 28.490456s HEC 7: waking up thread 284 on cap 7
> 28.490456s HEC 7: waking up thread 284 on cap 7
> 28.490457s HEC 7: waking up thread 284 on cap 7
> 28.490458s HEC 7: waking up thread 284 on cap 7
> .
> .
> .
> 28.491841s HEC 7: waking up thread 284 on cap 7
> 28.491850s HEC 7: migrating thread 284 to cap 0
> 28.491968s HEC 7: running thread 293
> .
> .
> .
>
> In other words, it appears that HEC 7 took a vacation of more than a
> millisecond as it tried (ultimately unsuccessfully) to wake up thread
> 284. These blocks are easy to spot in this event log and quite difficult
> to explain. In this case I was being silly and spawned a few too many
> capabilities (30 capabilities whereas the machine has 24 real cores, 2
> threads per core due to hyperthreading; I was running 24 threads).
Thread 284 is blocked in an STM transaction:
28320680000: cap 7: running thread 284
28320708000: cap 7: thread 284 has label "diff worker"
28320716000: cap 7: stopping thread 284 (blocked in STM retry)
anything that touches the TVar(s) that thread 284 read during its
transaction will cause a wakeup, and this entails sending a message to
cap 7. Sending a message to a cap involves taking a lock, so it's not
cheap and could lead to contention if there are lots of attempts to wake
up the thread.
So what we seem to be looking at is lots of wakeup messages sent to cap
7, and then at 28.490454s we see cap 7 processing all those messages,
which takes a full millisecond.
So a way to improve this would probably be to make STM wakeups atomic,
so we don't need to send repeated wakeup messages if the thread hasn't
woken up yet. This part of the system is very tricky, which is one
reason we simplify things by allowing any number of wakeup messages.
Only the capability that owns a TSO is allowed to modify it, everyone
else has to send a message.
For MVar we do break the rules though - when waking up a thread on an
MVar, the waker removes the TSO from the list (holding a lock on the
MVar), and then sends a message to the owner. The owner knows whether
to really wake up the TSO by checking whether it is still on the list or
not (it can only be removed from the list once, because of the lock on
the MVar).
So I think we could adopt a similar policy for STM. The mutex is
lockTSO()/unlockTSO(), which is already taken in STM.c:unpark_tso(). To
indicate that a thread has already been sent a wakeup message, we could
set tso->block_info.closure to a special value - normally it would be
END_TSO_QUEUE for a thread blocked in STM. I've been peering at the
code a bit, and I'm pretty sure this is safe and doesn't involve many
changes to the RTS.
Do you want to try this out and let me know if you get stuck?
Cheers,
Simon
> That being said, even under more reasonable conditions[3] (where I was
> running 24 capabilities with 20 threads on the same 24-core machine) I
> still see problems waking up. For example, from 32.293455s to 32.327349s
> see numerous attempts by many HECs to wake-up thread 517 on cap
> 12. In this period, cap 12 briefly runs thread 537 (32.326781s)
> but while running HEC 17 attempts to wake up 517 again.
>
> All-in-all, something seems a bit strange. Any ideas what might be going
> on here?
>
> I've CC'd Edward Yang (who I understand has recently been doing a
> rework on the scheduler) and Simon Marlow.
>
> Thanks,
>
> - Ben
>
>
> [1] https://github.com/bgamari/bayes-stack
> [2] http://goldnerlab.physics.umass.edu/~bgamari/Benchmark-wakeup.eventlog
> [3] http://goldnerlab.physics.umass.edu/~bgamari/Benchmark-wakeup-smaller.eventlog
>
More information about the Glasgow-haskell-users
mailing list