[Haskell-cafe] Parallel parsing & multicore

Wed Sep 9 09:40:56 EDT 2009

akborder:
> 
> The threaded version running on 2 cores is moderately faster than the
> serial one:
> 
> $ ./Parser +RTS -s -N2
>    2,377,165,256 bytes allocated in the heap
>       36,320,944 bytes copied during GC
>        6,020,720 bytes maximum residency (6 sample(s))
>        6,933,928 bytes maximum slop
>               21 MB total memory in use (0 MB lost due to fragmentation)
> 
>   Generation 0:  2410 collections,     0 parallel,  0.33s,  0.34s elapsed
>   Generation 1:     6 collections,     4 parallel,  0.06s,  0.05s elapsed
> 
>   Parallel GC work balance: 1.83 (2314641 / 1265968, ideal 2)
> 
>   Task  0 (worker) :  MUT time:   2.43s  (  1.19s elapsed)
>                       GC  time:   0.02s  (  0.02s elapsed)
> 
>   Task  1 (worker) :  MUT time:   2.15s  (  1.19s elapsed)
>                       GC  time:   0.29s  (  0.30s elapsed)
> 
>   Task  2 (worker) :  MUT time:   2.37s  (  1.19s elapsed)
>                       GC  time:   0.07s  (  0.08s elapsed)
> 
>   Task  3 (worker) :  MUT time:   2.45s  (  1.19s elapsed)
>                       GC  time:   0.00s  (  0.00s elapsed)
> 
>   INIT  time    0.00s  (  0.00s elapsed)
>   MUT   time    2.06s  (  1.19s elapsed)
>   GC    time    0.39s  (  0.39s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time    2.45s  (  1.58s elapsed)
> 
>   %GC time      15.7%  (24.9% elapsed)
> 
>   Alloc rate    1,151,990,234 bytes per MUT second
> 
>   Productivity  84.2% of total user, 130.2% of total elapsed
> 
> 
> The speedup is smaller than what I was expecting given that each unit
> of work (250 input lines) is completely independent from the others.
> Changing the size of each work unit did not help; garbage collection
> times are small enough that increasing the minimum heap size did not
> produce any speedup either.
> 
> Is there anything else I can do to understand why the parallel map
> does not provide a significant speedup?

Very interesting idea!

I think the big thing would be to measure it with GHC HEAD so you can
see how effectively the sparks are being converted into threads.

Is there a package and test case somewhere we can try out?

-- Don