[Haskell-cafe] Re: Parallel Pi

Simon Marlow marlowsd at gmail.com
Fri Mar 19 04:24:12 EDT 2010


On 18/03/10 22:52, Daniel Fischer wrote:
> Am Donnerstag 18 März 2010 22:44:55 schrieb Simon Marlow:
>> On 17/03/10 21:30, Daniel Fischer wrote:
>>> Am Mittwoch 17 März 2010 19:49:57 schrieb Artyom Kazak:
>>>> Hello!
>>>> I tried to implement the parallel Monte-Carlo method of computing Pi
>>>> number, using two cores:
>>>
>>> <move>
>>>
>>>> But it uses only on core:
>>>
>>> <snip>
>>>
>>>> We see that our one spark is pruned. Why?
>>>
>>> Well, the problem is that your tasks don't do any real work - yet.
>>> piMonte returns a thunk pretty immediately, that thunk is then
>>> evaluated by show, long after your chance for parallelism is gone. You
>>> must force the work to be done _in_ r1 and r2, then you get
>>> parallelism:
>>>
>>>     Generation 0:  2627 collections,  2626 parallel,  0.14s,  0.12s
>>> elapsed Generation 1:     1 collections,     1 parallel,  0.00s,
>>> 0.00s elapsed
>>>
>>>     Parallel GC work balance: 1.79 (429262 / 240225, ideal 2)
>>>
>>>                           MUT time (elapsed)       GC time  (elapsed)
>>>     Task  0 (worker) :    0.00s    (  8.22s)       0.00s    (  0.00s)
>>>     Task  1 (worker) :    8.16s    (  8.22s)       0.01s    (  0.01s)
>>>     Task  2 (worker) :    8.00s    (  8.22s)       0.13s    (  0.11s)
>>>     Task  3 (worker) :    0.00s    (  8.22s)       0.00s    (  0.00s)
>>>
>>>     SPARKS: 1 (1 converted, 0 pruned)
>>>
>>>     INIT  time    0.00s  (  0.00s elapsed)
>>>     MUT   time   16.14s  (  8.22s elapsed)
>>>     GC    time    0.14s  (  0.12s elapsed)
>>>     EXIT  time    0.00s  (  0.00s elapsed)
>>>     Total time   16.29s  (  8.34s elapsed)
>>>
>>>     %GC time       0.9%  (1.4% elapsed)
>>>
>>>     Alloc rate    163,684,377 bytes per MUT second
>>>
>>>     Productivity  99.1% of total user, 193.5% of total elapsed
>>>
>>> But alas, it is slower than the single-threaded calculation :(
>>>
>>>     INIT  time    0.00s  (  0.00s elapsed)
>>>     MUT   time    7.08s  (  7.10s elapsed)
>>>     GC    time    0.08s  (  0.08s elapsed)
>>>     EXIT  time    0.00s  (  0.00s elapsed)
>>>     Total time    7.15s  (  7.18s elapsed)
>>
>> It works for me (GHC 6.12.1):
>>
>>     SPARKS: 1 (1 converted, 0 pruned)
>>
>>     INIT  time    0.00s  (  0.00s elapsed)
>>     MUT   time    9.05s  (  4.54s elapsed)
>>     GC    time    0.12s  (  0.09s elapsed)
>>     EXIT  time    0.00s  (  0.01s elapsed)
>>     Total time    9.12s  (  4.63s elapsed)
>>
>> wall-clock speedup of 1.93 on 2 cores.
>
> Is that Artyom's original code or with the pseq'ed length?

Your fixed version.

> And, with -N2, I also have a productivity of 193.5%, but the elapsed time
> is larger than the elapsed time for -N1. How long does it take with -N1 for
> you?

The 1.93 speedup was compared to the time for -N1 (8.98s in my case).

>> What hardware are you using there?
>
> 3.06GHz Pentium 4, 2 cores.
> I have mixed results with parallelism, some programmes get a speed-up of
> nearly a factor 2 (wall-clock time), others 1.4, 1.5 or so, yet others take
> about the same wall-clock time as the single threaded programme, some -
> like this - take longer despite using both cores intensively.

I suspect it's something specific to that processor, probably 
cache-related.  Perhaps we've managed to put some data frequently 
accessed by both CPUs on the same cache line.  I'd have to do some 
detailed profiling on that processor to find out though.  If you're have 
the time and inclination, install oprofile and look for things like 
"memory ordering stalls".

>> Have you tried changing any GC settings?
>
> I've played around a little with -qg and -qb and -C, but that showed little
> influence. Any tips what else might be worth a try?

-A would be the other thing to try.

Cheers,
	Simon

>>
>> Cheers,
>> 	Simon
>



More information about the Haskell-Cafe mailing list