[Haskell-cafe] GHC multicore Haskell multicore synchronization (was Re: Weird multi-threading runtime behaviour of single-threaded program with GHC-7.0.3)

Fri Jun 24 10:32:45 CEST 2011

On 21/06/2011 10:02, Herbert Valerio Riedel wrote:
> Hello Simon,
>
> On Fri, 2011-06-17 at 09:05 +0100, Simon Marlow wrote:
>>> What's happening there? The actual processing work seems to be done in a
>>> single HEC... but what are the remaining 11 HECs doing exactly? Am I
>>> doing something wrong?
>>
>> The answer is, they're all doing GC.  When you say -N, the parallel GC
>> is turned on, so every GC requires the cooperation of all cores.  When
>> you're running parallel code this is a big win, but for sequential code
>> it could well be a slowdown.
>
> Speaking about cooperation of all cores... how much is the parallel GC
> affected by "multitasking-noise" (is there a better name for it?) in the
> system?
>
>
> There are two cases I'm thinking about:
>
> a) Say, I have a  8-core desktop workstation and run my GC-intensive (or
> massively parallel processing) Haskell program with "+RTS -N8", but I
> have a few desktop apps running, and using up a bit of CPU time (but on
> average just a few %)
>
> Does this already cause significant (i.e. measurable) synchronization
> delays due to 'non fully dedicated cores' in my Haskell program?

Yes it can do, although since we started using 'yield' in the spinlock 
code the problem is less noticeable.  I've been trying to address this 
problem with a new GC, for details see this paper:

http://community.haskell.org/~simonmar/papers/local-gc.pdf

There are a couple of workarounds if you're badly affected:

  - use a larger -A setting.  This might also degrade performance
    due to more cache misses (try it and see).  If your processor
    has lots of cache you might be able to go to -A1m or -A2m
    which reduces the GC frequency without impacting cache behaviour.

  - Don't use all the cores - e.g. use -N7 on an 8-core.

> b) What about virtualized guests (e.g. with VMware, KVM, etc)? Let's
> assume the Host system has 16 cores, and I partition those into 2 Guest
> VMs with 8 cores each assigned; Will there be a measurable/significant
> slow-downs due to synchronization delays in my "+RTS -N8" Haskell
> program?

I haven't tried with a VM, it would be an interesting experiment!

Cheers,
	Simon