[Haskell-cafe] Re: GHC's parallel garbage collector -- what am I doing wrong?

Sun Mar 7 15:53:28 EST 2010

On 07/03/10 14:41, Jan-Willem Maessen wrote:
>
> On Mar 3, 2010, at 8:44 AM, Simon Marlow wrote:
>
>> On 01/03/2010 21:20, Michael Lesniak wrote:
>>> Hello Bryan,
>>>
>>>> The parallel GC currently doesn't behave well with concurrent
>>>> programs that uses multiple capabilities (aka OS threads), and
>>>> the behaviour you see is the known symptom of this.. I believe
>>>> that Simon Marlow has some fixes in hand that may go into
>>>> 6.12.2.
>>
>> It's more correct to say the parallel GC has difficulty when one of
>> its threads is descheduled by the OS, because the other threads
>> just spin waiting for it.  Presumably some kernels are more
>> susceptible than others due to differences in scheduling policy, I
>> know they've been fiddling around with this a lot in Linux
>> recently.
>>
>> You typically don't see a problem when there are spare cores, the
>> slowdown manifests when you are trying to use all the cores in your
>> machine, so it affects people on dual-cores quite a lot. This
>> probably explains why I've not been particularly affected by this
>> myself, since I do most of my benchmarking on an 8-core box.
>>
>> The fix that will be in 6.12.2 is to insert some yields, so that
>> threads will yield rather than spinning indefinitely, and this
>> seems to help a lot.
>
> Be warned that inserting yield into a spin loop is also non-portable,
> and may make the problem *worse* on some systems.
>
> The problem is that "yield" calls can be taken by the scheduler to
> mean "See, I'm a nice thread, giving up the core when I don't need
> it.  Please give me extra Scheduling Dubloons."
 >
> Now let's say 7 of your 8 threads are doing this.  It's likely that
> each one will yield to the next, and the 8th thread (the one you
> actually want on-processor) could take a long time to bubble up and
> get its moment.  At one time on Solaris you could even livelock
> (because the scheduler didn't try particularly hard to be fair in the
> case of multiple yielding threads in a single process)---but that was
> admittedly a long time ago.

How depressing, thanks for that :)

> The only recourse I know about is to tell the OS you're doing
> synchronization (by using OS-visible locking calls, say the ones in
> pthreads or some of the lightweight calls that Linux has added for
> the purpose).  Obviously this has a cost if anyone falls out of the
> spin loop---and it's pretty likely some thread will have to wait a
> while.

Yes, so we tried using futexes on Linux, there's an experimental patch 
attached to

http://hackage.haskell.org/trac/ghc/ticket/3553

it was definitely slower than the spinlocks on the benchmarks I tried.

I think the problem is that we're using these spinlocks to synchronise 
across all cores, and it's likely that these loops will have to spin for 
a while before exiting becuase one or more of the running cores takes a 
while to get to a safe point.  But really giving up the core and 
blocking is a lot worse, becuas the wakeup time is so long (you can see 
it pretty clearly in ThreadScope).

Anyway, I hope all this is just a temporary problem until we get 
CPU-independent GC working.

Cheers,
	Simon