[Haskell-cafe] Re: GHC threads and SMP

Donald Bruce Stewart dons at cse.unsw.edu.au
Sat Jul 7 23:01:40 EDT 2007


ninegua:
> replying to my own message... the behavior is only when -O is used
> during compilation, otherwise they both run on 2 cores but at a much
> lower (1/100) speed.

Hmm, any change with -O2? Is the optimiser changing the code such that
the scheduler doesn't get to switch threads as often? If you change
the thread scheduler switching rate does that change anything?

See the GHC user's guide for more details:

    7.12.1.3. Scheduling policy for concurrent threads

    Runnable threads are scheduled in round-robin fashion. Context switches are
    signalled by the generation of new sparks or by the expiry of a virtual timer
    (the timer interval is configurable with the -C[<num>] RTS option). However, a
    context switch doesn't really happen until the current heap block is full. You
    can't get any faster context switching than this.

    When a context switch occurs, pending sparks which have not already been
    reduced to weak head normal form are turned into new threads. However, there is
    a limit to the number of active threads (runnable or blocked) which are allowed
    at any given time. This limit can be adjusted with the -t <num> RTS option (the
    default is 32). Once the thread limit is reached, any remaining sparks are
    deferred until some of the currently active threads are completed.

Perhaps SimonM can shed some light here?

> 
> On 7/6/07, Paul L <ninegua at gmail.com> wrote:
> >I have two parallel algorithms that use the lightweight GHC thread and
> >forkIO. I compile them using the -threaded (or -smp) option, and run
> >both with +RTS -N2 -RTS command line option.
> >
> >QSort is able to make use of the dual cores on my laptop -- "top"
> >shows that two threads show up and both CPUs are utilized, and "time"
> >it will give something like this:
> >
> >  real    0m0.502s
> >  user    0m0.872s
> >  sys     0m0.004s
> >
> >But Prime can only make use of one core, as shown by "top". "time" gives
> >
> >  real    0m9.112s
> >  user    0m9.093s
> >  sys     0m0.028s
> >
> >Because forkOS is not used anywhere, the decision of running them on 1
> >or 2 OS threads seem rather arbitary. Why?
> >
> >Regards,
> >Paul L
> >
> >
> >
> >> import Control.Concurrent
> >> import System.Random
> >> import Data.Array.MArray
> >> import Data.Array.IO
> >> import System.IO.Unsafe
> >> import Control.Exception
> >
> >1. Quick Sort
> >
> >> testQSort' n verbose = do
> >>   let b = (0, n - 1)
> >>   arr <- newArray b 0  :: IO (IOUArray Int Int)
> >>   initM' (mkStdGen 0) b arr
> >>   waitForChildren
> >>   qsortM' b arr
> >>   waitForChildren
> >>   if verbose then getElems arr >>= putStrLn . show else return ()
> >
> >Initialize an array with random numbers.
> >
> >> initM' g (i, j) arr | j - i < 10000 = fillArr g i j
> >>   where
> >>     fillArr g i j = if i > j then return () else do
> >>       let (v, g') = next g
> >>       writeArray arr i v >> fillArr g' (i + 1) j
> >> initM' g (i, j) arr = do
> >>   let k = (i + j) `div` 2
> >>       (g1, g2) = split g
> >>   forkChild $ initM' g1 (i, k) arr
> >>   forkChild $ initM' g2 (k + 1, j) arr
> >>   return ()
> >
> >> qsortM' (i, j) arr = qsort' (i, j)
> >>   where
> >>     qsort' (i, j) =
> >>       if j <= i then return () else do
> >>         k <- split i j
> >>         if j - i > 10000 then (forkChild $ qsort' (i, k - 1)) >> return 
> >()
> >>                         else qsort' (i, k - 1)
> >>         qsort' (k + 1, j)
> >>     split left right = do
> >>       v <- readArray arr right
> >>       let split' i j = if j == right then swap i right v >> return i 
> >else do
> >>             b <- readArray arr j
> >>             if b <= v
> >>               then (swap i j b) >> split' (i + 1) (j + 1)
> >>               else split' i (j + 1)
> >>       split' left left
> >>     swap i j b = do
> >>       a <- readArray arr i
> >>       writeArray arr i b
> >>       writeArray arr j a
> >
> >2. Prime
> >
> >> testPrime' n verbose = do
> >>   arr <- newArray (0, n) True :: IO (IOUArray Int Bool)
> >>   primeM' arr n
> >>   waitForChildren
> >>   if verbose
> >>     then getElems arr >>= putStrLn . show . map fst . filter snd . zip 
> >[0..]
> >>     else return ()
> >
> >> primeM' arr n = do
> >>   let p = truncate $ sqrt (fromIntegral n) + 1
> >>       remove i = if i > p then return () else do
> >>         spawnRemover (i + 1)
> >>         remove' (i + i)
> >>        where
> >>         remove' j = if j > n then return () else do
> >>           writeArray arr j False
> >>           remove' (j + i)
> >>         spawnRemover j = if j > n then return () else do
> >>           t <- readArray arr j
> >>           if t then forkChild (remove j) else spawnRemover (j + 1)
> >>   remove 2
> >
> >Manage thread termination
> >
> >> children :: MVar [MVar ()]
> >> children = unsafePerformIO (newMVar [])
> >
> >> waitForChildren :: IO ()
> >> waitForChildren = do
> >>   cs <- takeMVar children
> >>   case cs of
> >>     []   -> putMVar children cs
> >>     m:ms -> do
> >>       putMVar children ms
> >>       takeMVar m
> >>       waitForChildren
> >>
> >> forkChild :: IO () -> IO ()
> >> forkChild io = do
> >>   mvar <- newEmptyMVar
> >>   childs <- takeMVar children
> >>   putMVar children (mvar:childs)
> >>   forkIO (io `finally` putMVar mvar ())
> >>   return ()
> >
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe


More information about the Haskell-Cafe mailing list