Extended periods of "waking up thread %d on cap %d"
Ben Gamari
bgamari.foss at gmail.com
Sat Jan 26 01:28:27 CET 2013
Recently I've been benchmarking my concurrent Gibbs sampler[1] on a
largish multicore machine. I've been using GHC HEAD due to various
GC-related fixes that are present. One thing that I've noticed in
looking over the event logs is that there are large durations (tens of
milliseconds) when HECs appear to be constantly bombarded with wake-ups
from other capabilities.
In these periods, the eventlog will be filled with blocks of messages
such as this snippet from one benchmark run[2] (long repeated blocks
marked with ellipses, a few irrelevant messages have been omitted yet
not marked, see eventlog for full log),
28.320958s HEC 7: running thread 293
...
28.391802s HEC 6: running thread 209
28.392070s HEC 6: waking up thread 284 on cap 7
28.392302s HEC 6: waking up thread 284 on cap 7
28.392579s HEC 6: waking up thread 284 on cap 7
28.392808s HEC 6: waking up thread 284 on cap 7
.
.
.
28.405971s HEC 16: waking up thread 284 on cap 7
28.406026s HEC 15: waking up thread 284 on cap 7
28.406080s HEC 12: waking up thread 284 on cap 7
28.406091s HEC 23: waking up thread 284 on cap 7
28.406095s HEC 6: waking up thread 284 on cap 7
28.406095s HEC 23: waking up thread 284 on cap 7
28.406225s HEC 16: waking up thread 284 on cap 7
.
.
.
28.490449s HEC 7: stopping thread 293 (thread yielding)
28.490454s HEC 7: waking up thread 284 on cap 7
28.490454s HEC 7: waking up thread 284 on cap 7
28.490455s HEC 7: waking up thread 284 on cap 7
28.490456s HEC 7: waking up thread 284 on cap 7
28.490456s HEC 7: waking up thread 284 on cap 7
28.490457s HEC 7: waking up thread 284 on cap 7
28.490458s HEC 7: waking up thread 284 on cap 7
.
.
.
28.491841s HEC 7: waking up thread 284 on cap 7
28.491850s HEC 7: migrating thread 284 to cap 0
28.491968s HEC 7: running thread 293
.
.
.
In other words, it appears that HEC 7 took a vacation of more than a
millisecond as it tried (ultimately unsuccessfully) to wake up thread
284. These blocks are easy to spot in this event log and quite difficult
to explain. In this case I was being silly and spawned a few too many
capabilities (30 capabilities whereas the machine has 24 real cores, 2
threads per core due to hyperthreading; I was running 24 threads).
That being said, even under more reasonable conditions[3] (where I was
running 24 capabilities with 20 threads on the same 24-core machine) I
still see problems waking up. For example, from 32.293455s to 32.327349s
see numerous attempts by many HECs to wake-up thread 517 on cap
12. In this period, cap 12 briefly runs thread 537 (32.326781s)
but while running HEC 17 attempts to wake up 517 again.
All-in-all, something seems a bit strange. Any ideas what might be going
on here?
I've CC'd Edward Yang (who I understand has recently been doing a
rework on the scheduler) and Simon Marlow.
Thanks,
- Ben
[1] https://github.com/bgamari/bayes-stack
[2] http://goldnerlab.physics.umass.edu/~bgamari/Benchmark-wakeup.eventlog
[3] http://goldnerlab.physics.umass.edu/~bgamari/Benchmark-wakeup-smaller.eventlog
More information about the Glasgow-haskell-users
mailing list