[Haskell-cafe] SMP parallelism increasing GC time dramatically

austin seipp as at hacks.yi.org
Sat Oct 8 02:47:40 CEST 2011

It's GHC, and partly the OS scheduler in some sense. Oversaturating,
i.e. using an -N option > your number of logical cores (including
hyperthreads) will slow down your program typically. This isn't
uncommon, and is well known - GHC's lightweight threads have an M:N
threading model, but for good performance, you typically want the OS
threads and cores to have a 1:1 correspondence. Creating a massive
amount of OS threads will cause much context switching between them,
which is going to slow things down. When GHC needs to synchronize OS
threads in order to do a GC, there are going to be a lot of threads
being context swapped in/out in order to achieve this (because the GC
must halt all mutator threads to do its thing.)

Furthermore, oversaturation isn't the only problem - having the same
number of threads as cores will mean *some* thread is going to get
de-scheduled. Many times this means that the thread in which GHC does
GC will get descheduled by the OS. A corollary of this descheduling
phenomenon is that even using the same # of OS threads as you have
cores could result in -worse- performance than N-1 OS threads. This
was mitigated a bit in 7.2.1 I think. Linux was affected much more
drastically than others (OS X and Solaris have vastly different
schedulers, and as a result the performance wouldn't just tank - it
would actually get better IIRC, it just wouldn't scale as well at that
point.) At the very least, on Linux, using an -N option equivalent to
your number of logical cores should not drastically slow things down
anymore - but it won't make them faster either. This is the "dreaded
last core slowdown" bug that's been known about for a while, and as a
result, you typically only see parallel speedup on Linux up to N-1
threads, where N = the number of cores you have.

As a result, with dual-core only (and no hyperthreading,) on Linux,
you're very unlikely to be able to get parallel speedup in any case.
There's work to fix this in the garbage collector among other things,
but it's not clear if it's going into GHC just yet.

On Fri, Oct 7, 2011 at 2:31 PM, Oliver Batchelor <saulzar at gmail.com> wrote:
> I'm not sure if this is at all related, but if I run a small Repa program
> with more threads than I have cores/CPUs then it gets drastically slower, I
> have a dual core laptop - and -N2 makes my small program take approximately
> 0.6 of the time. Increasing to -N4 and we're running about 2x the time, -N8
> and it's taking 20x or more. I guess this is probably more down to the
> design of Repa rather than GHC itself?
> Oliver
> On Sat, Oct 8, 2011 at 1:21 AM, Tom Thorne <thomas.thorne21 at gmail.com>
> wrote:
>> I have made a dummy program that seems to exhibit the same GC
>> slowdown behavior, minus the segmentation faults. Compiling with -threaded
>> and running with -N12 I get very bad performance (3x slower than -N1),
>> running with -N12 -qg it runs approximately 3 times faster than -N1. I don't
>> know if I should submit this as a bug or not? I'd certainly like to know why
>> this is happening!
>> import Numeric.LinearAlgebra
>> import Numeric.GSL.Special.Gamma
>> import Control.Parallel.Strategies
>> import Control.Monad
>> import Data.IORef
>> import Data.Random
>> import Data.Random.Source.PureMT
>> import Debug.Trace
>> --
>> subsets s n = (subsets_stream s) !! n
>> subsets_stream [] = [[]] : repeat []
>> subsets_stream (x:xs) =
>> let r = subsets_stream xs
>>    s = map (map (x:)) r
>> in [[]] : zipWith (++) s (tail r)
>> testfun :: Matrix Double -> Int -> [Int] -> Double
>> testfun x k cs = lngamma (det (c+u))
>> where
>> (m,c) = meanCov xx
>> m' = fromRows [m]
>> u = (trans m') <> m'
>> xx = fromColumns ( [(toColumns x)!!i] ++ [(toColumns x)!!j] ++ [(toColumns
>> x)!!k] )
>> i = cs !! 0
>> j = cs !! 1
>> test :: Matrix Double -> Int -> Double
>> test x i = sum p
>> where
>> p = parMap (rdeepseq) (testfun x (i+1)) (subsets [0..i] 2)
>> ranMatrix :: Int -> RVar (Matrix Double)
>> ranMatrix n = do
>> xs <- mapM (\_ -> mapM (\_ -> uniform 0 1.0) [1..n]) [1..n]
>> return (fromLists xs)
>> loop :: Int -> Double -> Int -> RVar Double
>> loop n s i = traceShow i $ do
>> x <- ranMatrix n
>> let r = sum $ parMap (rdeepseq) (test x) [2..(n-2)]
>> return (r+s)
>> main = do
>> let n = 100
>> let iter = 5
>> rng <- newPureMT
>> rngr <- newIORef rng
>> p <- runRVar (foldM (loop n) 0.0 [1..iter]) rngr
>> print p
>> I have also found that the segmentation faults in my code disappear if I
>> switch from Control.Parallel to Control.Monad.Par, which is quite strange. I
>> get slightly better performance with Control.Parallel when it completes
>> without a seg. fault, and the frequency with which it does so seems to
>> depend on the number of sparks that are being created.
>> On Thu, Oct 6, 2011 at 1:56 PM, Tom Thorne <thomas.thorne21 at gmail.com>
>> wrote:
>>> I'm trying to narrow it down so that I can submit a meaningful bug
>>> report, and it seems to be something to do with switching off parallel GC
>>> using -qg, whilst also passing -Nx.
>>> Are there any known issues with this that people are aware of? At the
>>> moment I am using the latest haskell platform release on arch.
>>> I'd like to give 7.2 a try in case that fixes it, but I'm using rather a
>>> lot of libraries (hmatrix, fclabels, random fu) and I don't know how to
>>> install them for multiple ghc versions
>>> On Wed, Oct 5, 2011 at 10:43 PM, Johan Tibell <johan.tibell at gmail.com>
>>> wrote:
>>>> On Wed, Oct 5, 2011 at 2:37 PM, Tom Thorne <thomas.thorne21 at gmail.com>
>>>> wrote:
>>>>> The only problem is that now I am getting random occasional
>>>>> segmentation faults that I was not been getting before, and once got a
>>>>> message saying:
>>>>> Main: schedule: re-entered unsafely
>>>>> Perhaps a 'foreign import unsafe' should be 'safe'?
>>>>> I think this may be something to do with creating a lot of sparks
>>>>> though, since this occurs whether I have the parallel GC on or not.
>>>> Unless you (or some library you're using) is doing what the error
>>>> message says then you should file a GHC bug here:
>>>> http://hackage.haskell.org/trac/ghc/
>>>> -- Johan
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe


More information about the Haskell-Cafe mailing list