[Haskell-cafe] testing par with simple program

Fri Aug 21 14:17:42 EDT 2009

A better test program

import Control.Parallel

main = a `par` b  `pseq` print (a + b )
    where
        a = ack 3 11
        b = fib 39

ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

running it , these are the results

paolino at paolino-casa:~$ ghc --make prova   -threaded
[1 of 1] Compiling Main             ( prova.hs, prova.o )
Linking prova ...
paolino at paolino-casa:~$ time ./prova +RTS -N1
63262367

real    1m17.485s
user    1m16.473s
sys    0m0.392s
paolino at paolino-casa:~$ time ./prova +RTS -N2
63262367

real    1m20.186s
user    1m31.554s
sys    0m0.600s
paolino at paolino-casa:~$ touch prova.hs
paolino at paolino-casa:~$ ghc --make prova -O2  -threaded
[1 of 1] Compiling Main             ( prova.hs, prova.o )
Linking prova ...
paolino at paolino-casa:~$ time ./prova +RTS -N1
63262367

real    0m17.652s
user    0m15.277s
sys    0m0.108s
paolino at paolino-casa:~$ time ./prova +RTS -N2
63262367

real    0m13.650s
user    0m15.121s
sys    0m0.188s

>From the resource graph and the timings it is clear that the program is not
able to use all the 2 cores powers, considering computing 'a' alone is about
7 seconds and 'b' alone 9.
What is retaining the cpu's to run full power ?

paolino

2009/8/21 Don Stewart <dons at galois.com>

> paolo.veronelli:
> > Hi, reading a previous thread I got interested.
> > I simplified the example pointed by dons in
> >
> > import Control.Parallel
> >
> > main = a `par` b  `pseq` print (a + b )
> >     where
> >         a = ack 3 11
> >         b = ack 3 11
> >
> > ack 0 n = n+1
> > ack m 0 = ack (m-1) 1
> > ack m n = ack (m-1) (ack m (n-1))
> >
> > compiled with
> > ghc --make prova  -O2 -threaded
> >
> > timings
> > paolino at paolino-casa:~$ time ./prova +RTS -N1
> > 32762
> >
> > real    0m7.031s
> > user    0m6.304s
> > sys    0m0.004s
> > paolino at paolino-casa:~$ time ./prova +RTS -N2
> > 32762
> >
> > real    0m6.997s
> > user    0m6.728s
> > sys    0m0.020s
> > paolino at paolino-casa:~$
> >
> > without optimizations it gets worse
> >
> > paolino at paolino-casa:~$ time ./prova +RTS -N1
> > 32762
> >
> > real    1m20.706s
> > user    1m18.197s
> > sys    0m0.104s
> > paolino at paolino-casa:~$ time ./prova +RTS -N2
> > 32762
> >
> > real    1m38.927s
> > user    1m45.039s
> > sys    0m0.536s
> > paolino at paolino-casa:~$
> >
> > staring at the resource usage graph I can see it does use 2 cores when
> told to
> > do it, but with -N1 the used cpu goes 100% and with -N2 they both run
> just over
> > 50%
> >
> > thanks for comments
>
>
> Firstly, a and b are identical, so GHC commons them up. The compiler
> transforms it into:
>
>    a `par` a `seq` print (a + a)
>
> So you essentially fork a spark to evaluate 'a', and then have the main
> thread also evaluate 'a' again. One of them wins, then you add the
> result to itself. The runtime may choose not to convert your first spark
> into a thread.
>
> Running with a 2009 GHC head snapshot, we can see with +RTS -sstderr
>
>      SPARKS: 1 (0 converted, 0 pruned)
>
> That indeed, it doesn't convert your `par` into a real thread.
>
> While, for example, the helloworld on the wiki:
>
>    http://haskell.org/haskellwiki/Haskell_in_5_steps
>
> Converts 2 sparks to 2 theads:
>
>    SPARKS: 2 (2 converted, 0 pruned)
>    ./B +RTS -threaded -N2 -sstderr  2.13s user 0.04s system 137% cpu 1.570
> total
>
> -- Don
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090821/f92a3f18/attachment.html