[Haskell-cafe] Multicore speedup and spark conversion with ghc HEAD
Eric Rannaud
eric.rannaud at gmail.com
Thu Sep 2 15:19:42 EDT 2010
Follows three runs of a stupid program that should see a speedup (I believe).
1. Compiled with ghc-6.12.3. Run on 1 thread, 4.0 s (real).
2. Compiled with ghc-6.12.3. Run on 2 threads, spark converted, 5.4 s
(real), 7.7 s (user).
3. Compiled with ghc-6.13.20100831. Run on 2 threads, but spark *not*
converted 4.1 s (real).
All runs with disabled parallel GC, which negatively impacts -N2 runs
(whether the spark is converted or not).
Two questions:
A. Why is run (2) slower than (1)? GC is a bit more expensive, as one
would expect, but not enough to explain the difference (only explains
0.3 s).
B. ghc HEAD doesn't convert the spark. Why is that?
Thanks.
----
module Main where
import Control.Parallel
fac :: Integer -> Integer
fac n = aux n 1
where aux 0 _ = 0
aux 1 m = m
aux n m =
let p = n * m in
p `pseq` aux (n - 1) p
main = do
let x = fac 100000
y = fac 100001 in
print (x `par` y `pseq` (x + y))
----
$ ghc-6.12.3 -O2 -threaded -rtsopts Test.hs --make
$ time ./Test +RTS -N1 -qg -sstderr > /dev/null
./Test +RTS -N1 -qg -sstderr
20,460,660,928 bytes allocated in the heap
2,492,368 bytes copied during GC
462,128 bytes maximum residency (3 sample(s))
76,576 bytes maximum slop
4 MB total memory in use (1 MB lost due to fragmentation)
Generation 0: 35093 collections, 0 parallel, 0.14s, 0.17s elapsed
Generation 1: 3 collections, 0 parallel, 0.00s, 0.00s elapsed
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 0.00s ( 0.00s) 0.00s ( 0.00s)
Task 1 (worker) : 0.00s ( 3.84s) 0.00s ( 0.00s)
Task 2 (bound) : 3.87s ( 3.84s) 0.14s ( 0.17s)
SPARKS: 1 (0 converted, 1 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 3.83s ( 3.84s elapsed)
GC time 0.14s ( 0.17s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 3.97s ( 4.02s elapsed)
%GC time 3.6% (4.3% elapsed)
Alloc rate 5,345,814,403 bytes per MUT second
Productivity 96.4% of total user, 95.2% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_large_objects: 0
real 0m4.022s
user 0m3.970s
sys 0m0.051s
$ time ./Test +RTS -N2 -qg -sstderr > /dev/null
./Test +RTS -N2 -qg -sstderr
20,600,461,512 bytes allocated in the heap
3,585,432 bytes copied during GC
458,480 bytes maximum residency (14 sample(s))
76,576 bytes maximum slop
5 MB total memory in use (1 MB lost due to fragmentation)
Generation 0: 34803 collections, 0 parallel, 0.43s, 0.48s elapsed
Generation 1: 14 collections, 0 parallel, 0.00s, 0.00s elapsed
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 0.00s ( 0.00s) 0.00s ( 0.00s)
Task 1 (worker) : 3.72s ( 4.91s) 0.24s ( 0.25s)
Task 2 (bound) : 3.91s ( 4.91s) 0.19s ( 0.23s)
Task 3 (worker) : 0.00s ( 4.91s) 0.00s ( 0.00s)
SPARKS: 1 (1 converted, 0 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 7.29s ( 4.91s elapsed)
GC time 0.43s ( 0.48s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 7.72s ( 5.39s elapsed)
%GC time 5.6% (8.9% elapsed)
Alloc rate 2,824,733,790 bytes per MUT second
Productivity 94.4% of total user, 135.2% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_large_objects: 0
real 0m5.397s
user 0m7.724s
sys 0m0.443s
$ ghc-6.13.20100831 -O2 -threaded -rtsopts Test.hs
$ time ./Test +RTS -N2 -qg -sstderr > /dev/null
./Test +RTS -N2 -qg -sstderr
20,458,286,584 bytes allocated in the heap
2,705,816 bytes copied during GC
463,520 bytes maximum residency (3 sample(s))
73,032 bytes maximum slop
5 MB total memory in use (1 MB lost due to fragmentation)
Generation 0: 35088 collections, 0 parallel, 0.19s, 0.22s elapsed
Generation 1: 3 collections, 0 parallel, 0.00s, 0.00s elapsed
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 0.00s ( 0.00s) 0.00s ( 0.00s)
Task 1 (worker) : 0.00s ( 3.92s) 0.00s ( 0.00s)
Task 2 (bound) : 3.90s ( 3.92s) 0.19s ( 0.22s)
Task 3 (worker) : 0.00s ( 3.92s) 0.00s ( 0.00s)
SPARKS: 1 (0 converted, 1 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 3.84s ( 3.92s elapsed)
GC time 0.19s ( 0.22s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 4.03s ( 4.14s elapsed)
%GC time 4.7% (5.2% elapsed)
Alloc rate 5,329,874,595 bytes per MUT second
Productivity 95.3% of total user, 92.7% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0
real 0m4.144s
user 0m4.027s
sys 0m0.059s
More information about the Haskell-Cafe
mailing list