[Haskell-beginners] Parallelism?

Thu Dec 1 17:40:17 CET 2011

Simon Marlow investigated, and we got this patch out:

commit 6d18141d880d55958c3392f6a7ae621dc33ee5c1
Author: Simon Marlow <marlowsd at gmail.com>
Date:   Thu Dec 1 10:53:28 2011 +0000

    Fix a scheduling bug in the threaded RTS

    The parallel GC was using setContextSwitches() to stop all the other
    threads, which sets the context_switch flag on every Capability.  That
    had the side effect of causing every Capability to also switch
    threads, and since GCs can be much more frequent than context
    switches, this increased the context switch frequency.  When context
    switches are expensive (because the switch is between two bound
    threads or a bound and unbound thread), the difference is quite
    noticeable.

    The fix is to have a separate flag to indicate that a Capability
    should stop and return to the scheduler, but not switch threads.  I've
    called this the "interrupt" flag.

Thanks for constructing this lovely test-case!

Cheers,
Edward

P.S. When you run in single-threaded runtime, the second thread never
actually finishes. You need to manually wait on it for an MVar.
Also, note that putStrLn is synchronized by an MVar, to avoid interleaved
output.

Excerpts from Michael Craig's message of Thu Dec 01 00:50:15 -0500 2011:
> I was writing some parallel code (asynchronous database writes for an event
> logger, but that's besides the point), and it seemed like the parallelized
> version (i.e. compiled with -threaded -with-rtsopts=-N2) wasn't running
> fast enough. I boiled it down to a dead-simple test:
> 
>     import Control.Concurrent
>     import Data.Time.Clock.POSIX
>     import System.Environment
> 
>     main :: IO ()
>     main = do
>         n <- getArgs >>= return . read . head
>         t1 <- getPOSIXTime
>         work n
>         t2 <- getPOSIXTime
>         putStrLn $ show $ t2 - t1
>         putStrLn $ show $ (fromIntegral n :: Double)
>                         / (fromRational . toRational $ t2 - t1)
> 
>     work :: Integer -> IO ()
>     work n = do
>       forkIO $ putStrLn $ seq (fact n) "Done"
>       putStrLn $ seq (fact n) "Done"
> 
>     fact :: Integer -> Integer
>     fact 1 = 1
>     fact n = n * fact (n - 1)
> 
> (I know this is not the best way to time things but I think it suffices for
> this test.)
> 
> Compiled with ghc --make -O3 test.hs, ./test 500000 runs for 74 seconds.
> Compiling with ghc --make -O3 -threaded -with-rtsopts=-N, ./test 500000
> runs for 82 seconds (and seems to be using 2 cpu cores instead of just 1,
> on a 4-core machine). What gives?
> 
> Mike S Craig