[Haskell-cafe] Threading and Mullticore Computation
Svein Ove Aas
svein.ove at aas.no
Tue Mar 3 13:16:28 EST 2009
On Tue, Mar 3, 2009 at 6:54 PM, <mwinter at brocku.ca> wrote:
> I am using GHC 6.8.3. The -O2 option made both runs faster but the 2 core run
> is still much slower that the 1 core version. Will switching to 6.10 make the difference?
>
There are a lot of improvements; it's certainly worth a try.
For what it's worth, I tried it myself on 6.10.. details follow, but
overall impression is that while you lose some time to overhead, it's
still 50% faster than unthreaded.
While trying to optimize it, I ran "./test +RTS -N2 -H64m -M64m"; the
program promptly ate all my memory, invoking the OOM killer and
messing up my system something fierce. This has to be a bug.
GC time only accounts for 10% of the time used, but as I read these,
the parallell GC didn't do any good.
..I'm stumped.
==== time ./test +RTS -N1 -s ====
"Task1 done!"
"Task2 done!"
5750000000000
22,712,520 bytes allocated in the heap
2,982,440 bytes copied during GC
1,983,288 bytes maximum residency (2 sample(s))
30,208 bytes maximum slop
636 MB total memory in use (58 MB lost due to fragmentation)
Generation 0: 42 collections, 0 parallel, 0.12s, 0.13s elapsed
Generation 1: 2 collections, 0 parallel, 0.00s, 0.01s elapsed
Task 0 (worker) : MUT time: 2.85s ( 6.09s elapsed)
GC time: 0.07s ( 0.08s elapsed)
Task 1 (worker) : MUT time: 0.00s ( 6.09s elapsed)
GC time: 0.00s ( 0.00s elapsed)
Task 2 (worker) : MUT time: 2.66s ( 6.09s elapsed)
GC time: 0.05s ( 0.06s elapsed)
INIT time 0.00s ( 0.00s elapsed)
MUT time 4.78s ( 6.09s elapsed)
GC time 0.12s ( 0.14s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 4.81s ( 6.23s elapsed)
%GC time 2.5% (2.3% elapsed)
Alloc rate 4,842,754 bytes per MUT second
Productivity 97.5% of total user, 75.3% of total elapsed
recordMutableGen_sync: 0
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].steps[0].sync_todo: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_todo: 0
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_todo: 0
gen[1].steps[0].sync_large_objects: 0
real 0m6.319s
user 0m4.810s
sys 0m0.920s
==== time ./test +RTS -N2 -s ====
"Task2 done!"
"Task1 done!"
6860000000000
22,734,040 bytes allocated in the heap
2,926,160 bytes copied during GC
1,976,240 bytes maximum residency (2 sample(s))
117,584 bytes maximum slop
1234 MB total memory in use (107 MB lost due to fragmentation)
Generation 0: 32 collections, 13 parallel, 0.47s, 0.43s elapsed
Generation 1: 2 collections, 0 parallel, 0.01s, 0.01s elapsed
Parallel GC work balance: 1.00 (4188 / 4188, ideal 2)
Task 0 (worker) : MUT time: 0.00s ( 0.00s elapsed)
GC time: 0.00s ( 0.00s elapsed)
Task 1 (worker) : MUT time: 0.00s ( 0.00s elapsed)
GC time: 0.00s ( 0.00s elapsed)
Task 2 (worker) : MUT time: 3.10s ( 3.82s elapsed)
GC time: 0.09s ( 0.05s elapsed)
Task 3 (worker) : MUT time: 2.96s ( 3.82s elapsed)
GC time: 0.39s ( 0.39s elapsed)
Task 4 (worker) : MUT time: 0.00s ( 3.82s elapsed)
GC time: 0.00s ( 0.00s elapsed)
INIT time 0.00s ( 0.00s elapsed)
MUT time 5.23s ( 3.82s elapsed)
GC time 0.48s ( 0.44s elapsed)
EXIT time 0.01s ( 0.00s elapsed)
Total time 5.72s ( 4.26s elapsed)
%GC time 8.4% (10.4% elapsed)
Alloc rate 4,338,557 bytes per MUT second
Productivity 91.6% of total user, 123.0% of total elapsed
recordMutableGen_sync: 0
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].steps[0].sync_todo: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_todo: 0
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_todo: 0
gen[1].steps[0].sync_large_objects: 0
real 0m4.345s
user 0m5.680s
sys 0m1.250s
More information about the Haskell-Cafe
mailing list