[Haskell-cafe] Haskell Speed Myth

Mon Aug 25 13:57:50 EDT 2008

jed:
> On Sun 2008-08-24 11:03, Thomas M. DuBuisson wrote:
> > Yay, the multicore version pays off when the workload is non-trivial.
> > CPU utilization is still rather low for the -N2 case (70%).  I think the
> > Haskell threads have an affinity for certain OS threads (and thus a
> > CPU).  Perhaps it results in a CPU having both tokens of work and the
> > other having none?  
> 
> This must be obvious to everyone but the original thread-ring cannot
> possibly be faster with multiple OS thread since a thread can only be
> running if it has the token, otherwise it is just blocked on the token.
> If there are threads executing simultaneously, the token must at least
> be written to the shared cache if not to main memory.  With the single
> threaded runtime, the token may never leave L1.  The difference between
> -threaded -N1 and -nothreaded may be influenced by the effectiveness of
> prefetching the next thread (since presumably not all 503 threads can
> reside in L1).
> 

Simon Marlow sez:

    The thread-ring benchmark needs careful scheduling to get a speedup
    on multiple CPUs. I was only able to get a speedup by explicitly
    locking half of the ring onto each CPU. You can do this using
    GHC.Conc.forkOnIO in GHC 6.8.x, and you'll also need +RTS -qm -qw.

    Also make sure that you're not using the main thread for any part of
    the main computation, because the main thread is a bound thread and
    runs in its own OS thread, so communication between the main thread
    and any other thread is slow.