[Haskell-cafe] Scarry parallel garbage collection behavior (6.12.1,
Mac OSX)
Pavel Perikov
perikov at gmail.com
Fri Feb 19 16:26:42 EST 2010
Hi, list.
This is a long post, sorry. The bottom line: specifying +RTS -Nn where n > number of cores slows the program compiled with 6.12.1 by orders of magnitude under Mac OSX. Turning off parallel garbage collections with -qg
resolves the problem. Independent verification is appreciated.
I have a very simple program:
import Control.Parallel
main=do
let
s = s1 `par` ( s2 `pseq` s1 + s2)
s1 = sum [1..10000000]
s2 = sum [1..10000005]
print s
compiled with
ghc -O -threaded
running it with without any options
time ./clocktime
produces the following times:
real 0m3.779s
user 0m3.644s
sys 0m0.053s
Giving -N option to RTS (or -N2, I have two core macbook) shows decent speedup:
100000060000015
real 0m2.678s
user 0m4.762s
sys 0m0.043s
though increase in processor time by 30% seems suspicious.
Things become weird if I increase the number of threads. I couldn't expect any further speedup for this task but what I get is absolutely unexpected:
time ./clocktime +RTS -N3
100000060000015
real 0m57.071s
user 1m40.910s
sys 0m0.921s
Almost a minute!! Adding more threads does not help either :)
time ./clocktime +RTS -N4
100000060000015
real 1m48.281s
user 3m10.832s
sys 0m1.590s
A bit of playing with Instruments/DTrace shows that runaway garbage collection puts the program on its knees.
I the case of -N1 the top time consumers look like this:
Self % Total % Self Run % Running % # Self # Samples ms Self Run ms Running Parent % Library Source Path Symbol Name
12.8 12.8 408 clocktime __gmpz_add
10.7 10.7 341 clocktime integer_cmm_plusIntegerzh
10 10 321 clocktime allocateLocal
etc. This looks absolutely reasonable. Going to -N2 changes the picture to:
Self % Total % Self Run % Running % # Self # Samples ms Self Run ms Running Parent % Library Source Path Symbol Name
15.6 15.6 989 clocktime gcWorkerThread
11.6 11.6 736 clocktime waitForGcThreads
9.5 9.5 603 clocktime scavenge_until_all_done
7.4 7.4 471 clocktime setContextSwitches
6.2 6.2 396 clocktime __gmpz_add
-N3 Is awful:
Self % Total % Self Run % Running % # Self # Samples ms Self Run ms Running Parent % Library Source Path Symbol Name
51.7 51.7 29514 clocktime gcWorkerThread
25.5 25.5 14591 clocktime scavenge_until_all_done
7.2 7.2 4114 clocktime waitForGcThreads
5.5 5.5 3180 clocktime GarbageCollect
5.2 5.2 2998 clocktime setContextSwitches
0.4 0.4 240 clocktime integer_cmm_plusIntegerzh
Pavel
More information about the Haskell-Cafe
mailing list