[Haskell-cafe] Re: GHC threads and SMP

Tue Jul 10 05:04:48 EDT 2007

Donald Bruce Stewart wrote:
> ninegua:
>> replying to my own message... the behavior is only when -O is used
>> during compilation, otherwise they both run on 2 cores but at a much
>> lower (1/100) speed.
> 
> Hmm, any change with -O2? Is the optimiser changing the code such that
> the scheduler doesn't get to switch threads as often? If you change
> the thread scheduler switching rate does that change anything?
> 
> See the GHC user's guide for more details:
> 
>     7.12.1.3.�Scheduling policy for concurrent threads
> 
>     Runnable threads are scheduled in round-robin fashion. Context switches are
>     signalled by the generation of new sparks or by the expiry of a virtual timer
>     (the timer interval is configurable with the -C[<num>] RTS option). However, a
>     context switch doesn't really happen until the current heap block is full. You
>     can't get any faster context switching than this.
> 
>     When a context switch occurs, pending sparks which have not already been
>     reduced to weak head normal form are turned into new threads. However, there is
>     a limit to the number of active threads (runnable or blocked) which are allowed
>     at any given time. This limit can be adjusted with the -t <num> RTS option (the
>     default is 32). Once the thread limit is reached, any remaining sparks are
>     deferred until some of the currently active threads are completed.

I think you got that from an old version of the users's guide - it certainly 
isn't in the 6.6.1 or HEAD versions of the docs.

I don't have any specific advice about the program in this thread, but in my 
(limited) experience with debugging parallelism problems in GHC, these are common:

  (a) the child threads aren't doing any work, just accumulating a large
      thunk which gets evaluated by the main thread sequentially.

  (b) you have a sequential dependency somewhere

  (c) tight loops that don't allocate don't give the scheduler a chance
      to run and load-balance.

  (d) GHC's scheduler is too stupid

I doubt that (c) is a problem for you: it normally occurs when you try to use 
par/seq and strategies, and are playing with parallel fibonacci.  Here you are 
using forkIO which definitely allocates, so that shouldn't be a problem.

(d) is quite possible.  I once tried to parallelise the simple concurrency 
example from the language shootout, which essentially consists of a long chain 
of threads with data items being passed along the chain.  I could only get any 
kind of speedup when I fixed half the chain on to each CPU, rather than using 
the automatic migration logic in the scheduler.  You can use GHC.Conc.forkOn for 
this:

   forkOnIO :: Int -> IO () -> IO ThreadId

pass it an integer T, and the thread will be stuck to CPU T `mod` N (where N is 
the number of CPUs).  The RTS doesn't really phyisically fix its execution units 
to CPUs, but usually the OS manages to do a reasonable job of this.

In GHC 6.8, hopefully we'll have some better tools for debugging parallelism 
performance problems.  Michael Adams (who just finished an internship here at 
MSR) ported some of the GranSim visualisation tools to the current GHC, I have 
the patches sitting in my inbox ready to review.

Cheers,
	Simon