[Haskell-cafe] Threading and Mullticore Computation

Svein Ove Aas svein.ove at aas.no
Tue Mar 3 13:16:28 EST 2009


On Tue, Mar 3, 2009 at 6:54 PM,  <mwinter at brocku.ca> wrote:
> I am using GHC 6.8.3. The -O2 option made both runs faster but the 2 core run
> is still much slower that the 1 core version. Will switching to 6.10 make the difference?
>
There are a lot of improvements; it's certainly worth a try.

For what it's worth, I tried it myself on 6.10.. details follow, but
overall impression is that while you lose some time to overhead, it's
still 50% faster than unthreaded.

While trying to optimize it, I ran "./test +RTS -N2 -H64m -M64m"; the
program promptly ate all my memory, invoking the OOM killer and
messing up my system something fierce. This has to be a bug.

GC time only accounts for 10% of the time used, but as I read these,
the parallell GC didn't do any good.

..I'm stumped.

==== time ./test +RTS -N1 -s ====
"Task1 done!"
"Task2 done!"
5750000000000
      22,712,520 bytes allocated in the heap
       2,982,440 bytes copied during GC
       1,983,288 bytes maximum residency (2 sample(s))
          30,208 bytes maximum slop
             636 MB total memory in use (58 MB lost due to fragmentation)

  Generation 0:    42 collections,     0 parallel,  0.12s,  0.13s elapsed
  Generation 1:     2 collections,     0 parallel,  0.00s,  0.01s elapsed

  Task  0 (worker) :  MUT time:   2.85s  (  6.09s elapsed)
                      GC  time:   0.07s  (  0.08s elapsed)

  Task  1 (worker) :  MUT time:   0.00s  (  6.09s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  2 (worker) :  MUT time:   2.66s  (  6.09s elapsed)
                      GC  time:   0.05s  (  0.06s elapsed)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    4.78s  (  6.09s elapsed)
  GC    time    0.12s  (  0.14s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    4.81s  (  6.23s elapsed)

  %GC time       2.5%  (2.3% elapsed)

  Alloc rate    4,842,754 bytes per MUT second

  Productivity  97.5% of total user, 75.3% of total elapsed

recordMutableGen_sync: 0
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].steps[0].sync_todo: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_todo: 0
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_todo: 0
gen[1].steps[0].sync_large_objects: 0

real	0m6.319s
user	0m4.810s
sys	0m0.920s


==== time ./test +RTS -N2 -s ====
"Task2 done!"
"Task1 done!"
6860000000000
      22,734,040 bytes allocated in the heap
       2,926,160 bytes copied during GC
       1,976,240 bytes maximum residency (2 sample(s))
         117,584 bytes maximum slop
            1234 MB total memory in use (107 MB lost due to fragmentation)

  Generation 0:    32 collections,    13 parallel,  0.47s,  0.43s elapsed
  Generation 1:     2 collections,     0 parallel,  0.01s,  0.01s elapsed

  Parallel GC work balance: 1.00 (4188 / 4188, ideal 2)

  Task  0 (worker) :  MUT time:   0.00s  (  0.00s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  1 (worker) :  MUT time:   0.00s  (  0.00s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  2 (worker) :  MUT time:   3.10s  (  3.82s elapsed)
                      GC  time:   0.09s  (  0.05s elapsed)

  Task  3 (worker) :  MUT time:   2.96s  (  3.82s elapsed)
                      GC  time:   0.39s  (  0.39s elapsed)

  Task  4 (worker) :  MUT time:   0.00s  (  3.82s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    5.23s  (  3.82s elapsed)
  GC    time    0.48s  (  0.44s elapsed)
  EXIT  time    0.01s  (  0.00s elapsed)
  Total time    5.72s  (  4.26s elapsed)

  %GC time       8.4%  (10.4% elapsed)

  Alloc rate    4,338,557 bytes per MUT second

  Productivity  91.6% of total user, 123.0% of total elapsed

recordMutableGen_sync: 0
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].steps[0].sync_todo: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_todo: 0
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_todo: 0
gen[1].steps[0].sync_large_objects: 0

real	0m4.345s
user	0m5.680s
sys	0m1.250s


More information about the Haskell-Cafe mailing list