[Haskell-cafe] forkIO on multicore

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Fri Dec 19 12:52:20 EST 2008

On Fri, 2008-12-19 at 10:42 -0600, Jake McArthur wrote:
> Hash: SHA1
> Paul Keir wrote:
> > fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
> This is a CAF (Constant Applicative Form). Since it is actually a
> constant it is never garbage collected, and is always shared, so each
> thread is only calculating it once. You have essentially created a
> lookup table.

Though note that with all our obvious suggestions there is still no

heavytask m n = putMVar m $! (fibs !! 100000)
    fibs = n : (n+1) : zipWith (+) fibs (tail fibs)

-- so now fibs is not globally shared but is used per-heavytask
-- it is also evaluated by heavy task rather than just putting a thunk
-- into the MVar

main = do ms <- sequence $ replicate 8 newEmptyMVar
            [ forkIO (heavytask m n)
            | (m, n) <- zip ms [0..] ]
          ms' <- mapM takeMVar ms
          mapM_ print ms'

Looking at the GC stats (+RTS -t -RTS) we see that the majority of the
time in this program is spent doing GC and that when we run with -N4 the
time spent doing GC is even higher.

1.57 MUT (1.60 elapsed), 7.05 GC (7.16 elapsed)
real	0m8.793s

2.50 MUT (1.49 elapsed), 8.48 GC (7.33 elapsed)
real	0m8.873s

2.83 MUT (1.56 elapsed), 12.16 GC (7.95 elapsed)
real	0m9.572s

The process monitor indicates that in the -N1 case, one core hits 100%
use for the full 8 seconds.

In the -N2 case one core is hitting 90% utilisation with the other three
cores doing a little work, up to about 40% utilisation. On some runs the
core doing the most work swaps over.

In one run at -N2 I got a segmentation fault.

In the -N4 case, 4 cores hit between 30% and 80% utilisation.

So this benchmark is primarily a stress test of the parallel garbage
collector since it is GC that is taking 75-80% of the time. Note that
the mutator elapsed time goes down slightly with 2 cores compared to 1
however the GC elapsed time goes up slightly.


