[Haskell-cafe] Re: GHC's parallel garbage collector -- what am
I doing wrong?
marlowsd at gmail.com
Sun Mar 7 15:53:28 EST 2010
On 07/03/10 14:41, Jan-Willem Maessen wrote:
> On Mar 3, 2010, at 8:44 AM, Simon Marlow wrote:
>> On 01/03/2010 21:20, Michael Lesniak wrote:
>>> Hello Bryan,
>>>> The parallel GC currently doesn't behave well with concurrent
>>>> programs that uses multiple capabilities (aka OS threads), and
>>>> the behaviour you see is the known symptom of this.. I believe
>>>> that Simon Marlow has some fixes in hand that may go into
>> It's more correct to say the parallel GC has difficulty when one of
>> its threads is descheduled by the OS, because the other threads
>> just spin waiting for it. Presumably some kernels are more
>> susceptible than others due to differences in scheduling policy, I
>> know they've been fiddling around with this a lot in Linux
>> You typically don't see a problem when there are spare cores, the
>> slowdown manifests when you are trying to use all the cores in your
>> machine, so it affects people on dual-cores quite a lot. This
>> probably explains why I've not been particularly affected by this
>> myself, since I do most of my benchmarking on an 8-core box.
>> The fix that will be in 6.12.2 is to insert some yields, so that
>> threads will yield rather than spinning indefinitely, and this
>> seems to help a lot.
> Be warned that inserting yield into a spin loop is also non-portable,
> and may make the problem *worse* on some systems.
> The problem is that "yield" calls can be taken by the scheduler to
> mean "See, I'm a nice thread, giving up the core when I don't need
> it. Please give me extra Scheduling Dubloons."
> Now let's say 7 of your 8 threads are doing this. It's likely that
> each one will yield to the next, and the 8th thread (the one you
> actually want on-processor) could take a long time to bubble up and
> get its moment. At one time on Solaris you could even livelock
> (because the scheduler didn't try particularly hard to be fair in the
> case of multiple yielding threads in a single process)---but that was
> admittedly a long time ago.
How depressing, thanks for that :)
> The only recourse I know about is to tell the OS you're doing
> synchronization (by using OS-visible locking calls, say the ones in
> pthreads or some of the lightweight calls that Linux has added for
> the purpose). Obviously this has a cost if anyone falls out of the
> spin loop---and it's pretty likely some thread will have to wait a
Yes, so we tried using futexes on Linux, there's an experimental patch
it was definitely slower than the spinlocks on the benchmarks I tried.
I think the problem is that we're using these spinlocks to synchronise
across all cores, and it's likely that these loops will have to spin for
a while before exiting becuase one or more of the running cores takes a
while to get to a safe point. But really giving up the core and
blocking is a lot worse, becuas the wakeup time is so long (you can see
it pretty clearly in ThreadScope).
Anyway, I hope all this is just a temporary problem until we get
CPU-independent GC working.
More information about the Haskell-Cafe