[Haskell-cafe] Threading and Mullticore Computation

Don Stewart dons at galois.com
Tue Mar 3 13:38:39 EST 2009


andrewcoppin:
> Svein Ove Aas wrote:
>> For what it's worth, I tried it myself on 6.10.. details follow, but
>> overall impression is that while you lose some time to overhead, it's
>> still 50% faster than unthreaded.

On a quad core, ghc 6.10 snapshot from today:

Single threaded

    whirlpool$  ghc-6.10.1.20090302 -O2 A.hs --make -fforce-recomp
    [1 of 1] Compiling Main             ( A.hs, A.o )
    Linking A ...
    whirlpool$ time ./A                                           
    "Task2 done!"
    "Task1 done!"
    4000249000000
    ./A  3.99s user 0.01s system 99% cpu 4.001 total


-threaded, with various N

    whirlpool$ ghc-6.10.1.20090302 -O2 A.hs -threaded --make 
    [1 of 1] Compiling Main             ( A.hs, A.o )
    Linking A ...

N=1

      whirlpool$ time ./A +RTS -N1 -sstderr
    ./A +RTS -N1 -sstderr 
    "Task2 done!"
    "Task1 done!"
    5908369000000

       6,468,629,288 bytes allocated in the heap
         128,647,752 bytes copied during GC
           1,996,320 bytes maximum residency (563 sample(s))
             426,512 bytes maximum slop
                   7 MB total memory in use (1 MB lost due to fragmentation)

      %GC time      61.0%  (62.1% elapsed)
                ^^^^^^^^^^^^^^^^^^^^^

      Alloc rate    2,699,611,953 bytes per MUT second

      Productivity  39.0% of total user, 39.8% of total elapsed

    ./A +RTS -N1 -sstderr  6.14s user 0.06s system 102% cpu 6.016 total  

So 61% of time spent in GC.

N=2
    
        whirlpool$ time ./A +RTS -N2 -sstderr  
    ./A +RTS -N2 -sstderr 
    "Task2 done!"
    "Task1 done!"
    6360397000000

       6,511,269,512 bytes allocated in the heap
           3,684,592 bytes copied during GC
           1,566,800 bytes maximum residency (3 sample(s))
              34,496 bytes maximum slop
                   5 MB total memory in use (1 MB lost due to fragmentation)

      %GC time      43.1%  (63.5% elapsed)

      Alloc rate    1,384,112,532 bytes per MUT second

      Productivity  56.9% of total user, 82.8% of total elapsed

    ./A +RTS -N2 -sstderr  8.26s user 0.09s system 146% cpu 5.681 total

Getting rid of the space leaky version of fac:

    whirlpool$ time ./A +RTS -N2 -H50M -sstderr 
    ./A +RTS -N2 -H50M -sstderr 
    "Task1 done!"
    "Task2 done!"
    5700355000000

       6,512,828,504 bytes allocated in the heap
           1,224,488 bytes copied during GC
               6,656 bytes maximum residency (1 sample(s))
             116,136 bytes maximum slop
                  50 MB total memory in use (1 MB lost due to fragmentation)

      %GC time      60.6%  (76.4% elapsed)

      Alloc rate    2,778,330,289 bytes per MUT second

      Productivity  39.4% of total user, 49.5% of total elapsed
    
    ./A +RTS -N2 -H50M -sstderr  6.30s user 0.42s system 141% cpu 4.737 total

I'm not sure there's anything weird going on here, other than just naive
implementations of factorial making my cores hot.    


More information about the Haskell-Cafe mailing list