[Haskell-cafe] how to read CPU time vs wall time report from GHC?
Wishnu Prasetya
s.w.b.prasetya at uu.nl
Sun Aug 14 22:42:13 CEST 2011
On 14-8-2011 22:17, Daniel Fischer wrote:
> On Sunday 14 August 2011, 21:53:21, Iustin Pop wrote:
>> On Sun, Aug 14, 2011 at 08:32:36PM +0200, Wishnu Prasetya wrote:
>>> On 14-8-2011 20:25, Iustin Pop wrote:
>>>> On Sun, Aug 14, 2011 at 08:11:36PM +0200, Wishnu Prasetya wrote:
>>>>> Hi guys,
>>>>>
>>>>> I'm new in parallel programming with Haskell. I made a simple test
>>>>> program using that par combinator etc, and was a bit unhappy that it
>>>>> turns out to be slower than its sequential version. But firstly, I
>>>>> dont fully understand how to read the runtime report produced by GHC
>>>>>
>>>>> with -s option:
>>>>> SPARKS: 5 (5 converted, 0 pruned)
>>>>>
>>>>> INIT time 0.02s ( 0.01s elapsed)
>>>>> MUT time 3.46s ( 0.89s elapsed)
>>>>> GC time 5.49s ( 1.46s elapsed)
>>>>> EXIT time 0.00s ( 0.00s elapsed)
>>>>> Total time 8.97s ( 2.36s elapsed)
>>>>>
>>>>> As I understand it from the documentation, the left time-column is
>>>>> the CPU time, whereas the right one is elapses wall time. But how
>>>>> come that the wall time is less than the CPU time? Isn't wall time =
>>>>> user's perspective of time; so that is CPU time + IO + etc?
>>>> Yes, but if you have multiple CPUs, then CPU time "accumulates"
>>>> faster than wall-clock time.
>>>>
>>>> Based on the above example, I guess you have or you run the program
>>>> on 4 cores (2.36 * 4 = 9.44, which means you got a very nice ~95%
>>>> efficiency).
>>>>
>>>> regards,
>>>> iustin
>>> That makes sense... But are you sure thats how i should read this?
>> As far as I know, this is correct.
> It is indeed. CPU time is the sum of CPU time for all threads, which is
> typically larger than elapsed time when several threads run in parallel.
>
>>> I dont want to jump happy too early.
>> Well, you algorithm does work in parallel, but if you look at the GC/MUT
>> time, ~60% of the total runtime is spent in GC, so you have a space leak
>> or an otherwise inefficient algorithm.
> Not enough data to make more than guesses concerning the cause, but 60% GC
> definitely indicates a problem with the algorithm (resp. its
> implementation),
>
>> The final speedup is just
>> 3.46s/2.36s, i.e. 1.46x instead of ~4x, so you still have some work to
>> do to make this better.
> We don't know the times for a non-threaded run (or an -N1 run), so it could
> be anything from a slowdown to a> 4× speedup (but it's likely to be a
> speedup by a factor< 4×).
Well, the -N1 is below. The sequential version of the program has almost
the same profile:
SPARKS: 5 (1 converted, 4 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 2.78s ( 2.99s elapsed)
GC time 4.35s ( 4.15s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 7.13s ( 7.14s elapsed)
Am I correct then to say that the speed up with respect to sequential is
equal to: tot-elapse-time-N1 / tot-elapse-N4 ? So I have 7.14 / 2.36 =
3.0 speed up, and not 1.46 as Iustin said?
I'll probably have to do something with that GC :)
--Wish.
More information about the Haskell-Cafe
mailing list