[Haskell-cafe] Slowdown with GHC when using multiple CPUs

Alexander Morozov etesial at gmail.com
Fri Jun 29 23:00:25 CEST 2012


Garbage collector have to stop all threads while it's running. Then
each thread collects garbage in its private zone. If some thread is
paused by OS before it's done, all other threads have to do nothing
and wait while it gets resheduled by OS some timeslices later and
completes its work.

You can try to use single-threaded garbage collector (-gq), it should
work better as only one garbage-collecting thread is running and it's
unlikely to be sheduled out by os since there are still an other free
core.

On my 4-core machine with 2 cores busy by other processes:
Threaded collector:
-N1 : 12.0
-N2 : 6.4
-N3 : 8.3
-N4 : 14.6

Single-threaded (-qg):
-N1 : 11.8
-N2 : 7.1
-N3 : 7.1
-N4 : 7.0

On Fri, Jun 29, 2012 at 3:30 PM, David Powell <david at drp.id.au> wrote:
> I recently ran into a problem writing a program that I hoped I could speed
> up by running over multiple CPUs.  I want non-haskell users to be able to
> run the tool and take advantage of multiple CPUs.  *But* there is a serious
> slowdown when the RTS is run with -N and some of the CPUs are already busy.
>
> This is already mentioned in the GHC docs, but the problem I experienced was
> more serious than I expected causing a slowdown of around 2x compared to
> running with a single CPU.
>
> I reproduced the problem with the following code from the haskell wiki:
>
> {-# LANGUAGE BangPatterns #-}
>     import Data.Digest.Pure.MD5
>     import qualified Data.ByteString.Lazy as L
>     import System.Environment
>     import Control.Concurrent
>     import Control.Monad (replicateM_)
>
>     main = do
>         files <- getArgs
>         str <- newEmptyMVar
>         mapM_ (forkIO . hashAndPrint str) files
>         printNrResults (length files) str
>
>     printNrResults i var = replicateM_ i (takeMVar var >>= putStrLn)
>
>     hashAndPrint str f = do
>         bs <- L.readFile f
>         let !h = show $ md5 bs
>         putMVar str (f ++ ": " ++ h)
>
>
> When run on 4 idle CPU cores, I get the following wall clock times:
>   ./run +RTS -N1   : 20.4 sec
>   ./run +RTS -N2   : 11.0 sec
>   ./run +RTS -N4   : 6.7 sec
>
> When run on the same 4 core machine, but with 2 cores already busy:
>   ./run +RTS -N1   : 23.5 sec
>   ./run +RTS -N2   : 14.1 sec
>   ./run +RTS -N4   : 57.8 sec   <---- Blowout...
>
> This is quite a problem in practice when running on a shared server.  Is
> there anything that can be done to address this?
>
> (I wrote up a few more details
> here: http://thunking.drp.id.au/2012/06/slowdown-with-ghc-when-using-multiple.html)
>
> Thanks,
>
> --
> David Powell
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



More information about the Haskell-Cafe mailing list