[Haskell-cafe] Parallel parsing & multicore
Don Stewart
dons at galois.com
Wed Sep 9 09:40:56 EDT 2009
akborder:
>
> The threaded version running on 2 cores is moderately faster than the
> serial one:
>
> $ ./Parser +RTS -s -N2
> 2,377,165,256 bytes allocated in the heap
> 36,320,944 bytes copied during GC
> 6,020,720 bytes maximum residency (6 sample(s))
> 6,933,928 bytes maximum slop
> 21 MB total memory in use (0 MB lost due to fragmentation)
>
> Generation 0: 2410 collections, 0 parallel, 0.33s, 0.34s elapsed
> Generation 1: 6 collections, 4 parallel, 0.06s, 0.05s elapsed
>
> Parallel GC work balance: 1.83 (2314641 / 1265968, ideal 2)
>
> Task 0 (worker) : MUT time: 2.43s ( 1.19s elapsed)
> GC time: 0.02s ( 0.02s elapsed)
>
> Task 1 (worker) : MUT time: 2.15s ( 1.19s elapsed)
> GC time: 0.29s ( 0.30s elapsed)
>
> Task 2 (worker) : MUT time: 2.37s ( 1.19s elapsed)
> GC time: 0.07s ( 0.08s elapsed)
>
> Task 3 (worker) : MUT time: 2.45s ( 1.19s elapsed)
> GC time: 0.00s ( 0.00s elapsed)
>
> INIT time 0.00s ( 0.00s elapsed)
> MUT time 2.06s ( 1.19s elapsed)
> GC time 0.39s ( 0.39s elapsed)
> EXIT time 0.00s ( 0.00s elapsed)
> Total time 2.45s ( 1.58s elapsed)
>
> %GC time 15.7% (24.9% elapsed)
>
> Alloc rate 1,151,990,234 bytes per MUT second
>
> Productivity 84.2% of total user, 130.2% of total elapsed
>
>
> The speedup is smaller than what I was expecting given that each unit
> of work (250 input lines) is completely independent from the others.
> Changing the size of each work unit did not help; garbage collection
> times are small enough that increasing the minimum heap size did not
> produce any speedup either.
>
> Is there anything else I can do to understand why the parallel map
> does not provide a significant speedup?
Very interesting idea!
I think the big thing would be to measure it with GHC HEAD so you can
see how effectively the sparks are being converted into threads.
Is there a package and test case somewhere we can try out?
-- Don
More information about the Haskell-Cafe
mailing list