[Haskell-cafe] Weird multi-threading runtime behaviour of single-threaded program with GHC-7.0.3
Simon Marlow
marlowsd at gmail.com
Fri Jun 17 10:05:46 CEST 2011
I found this email thanks to the Parallel Haskell Digest, thanks Eric &
Nick!
On 01/04/2011 11:37, Herbert Valerio Riedel wrote:
> I'm experiencing an unexpected behaviour with GHC-7.0.3/x86_64's
> multithreading runtime when running some more demanding single-thread
> computations.
>
> I've isolated the part of the larger application which exposes the
> problem down to the parsing of a ~8MiB JSON file which causes quite a
> bit of memory allocations (source at end of email). The program is
> compiled with
>
> ghc --make -O2 -threaded -eventlog -rtsopts parse_aeson.hs
>
> and I ran it with and w/o the -N12 option on a 12-core machine (the
> problem is visible on a 4-core machine as well):
>
> $ time ./parse_aeson test.json
> 0.000392s after readFile
> 0.001016s parse was a success...
> 6.050836s NFed parse-tree...
> 6.050968s NFed parse-tree...
> no exception
>
> real 0m6.100s
> user 0m5.680s
> sys 0m0.419s
>
>
> $ time ./parse_aeson test.json +RTS -N12
> 0.001249s after readFile
> 0.001777s parse was a success...
> 10.048607s NFed parse-tree...
> 10.048738s NFed parse-tree...
> no exception
>
> real 0m10.125s
> user 0m57.496s
> sys 0m39.927s
>
>
> When observing top(1)'s output, I saw that all HECs consumed a
> substantial amount of cpu cycle, so I loaded up threadscope to see what
> the remaining 11 HECs were doing all the time.
>
>
> I've put the resulting eventlog files and screenshots of threadscope's
> visualization online at
>
> http://www.narf.at/~hvr/Haskell/parse_aeson.N0.eventlog.png
> http://www.narf.at/~hvr/Haskell/parse_aeson.N0.eventlog.xz
>
> http://www.narf.at/~hvr/Haskell/parse_aeson.N12.eventlog.png
> http://www.narf.at/~hvr/Haskell/parse_aeson.N12.eventlog.xz
>
>
> What's happening there? The actual processing work seems to be done in a
> single HEC... but what are the remaining 11 HECs doing exactly? Am I
> doing something wrong?
The answer is, they're all doing GC. When you say -N, the parallel GC
is turned on, so every GC requires the cooperation of all cores. When
you're running parallel code this is a big win, but for sequential code
it could well be a slowdown.
You probably want to turn off parallel GC for the young generation with
"+RTS -qg1". If that doesn't help, try turning it off entirely: "+RTS -qg".
I have experimented with making this happen automatically, but so far
haven't managed to find a heuristic that improves performance consistently.
> What does it mean when the GC-bars in threadscope becomes green as in
> the middle of the screenshot below?
>
> http://www.narf.at/~hvr/Haskell/parse_aeson.N4.eventlog.pdf
The green bit is where the GC is traversing the heap, as opposed to the
other administrative tasks that the GC does (such as waiting for the
other HECs to stop, dealing with weak pointers, freeing memory, etc.).
The green bars let you see when one core is doing most of the work
during parallel GC, which can be an indication that your data structures
are mainly linear (like lists) rather than tree-like.
Cheers,
Simon
More information about the Haskell-Cafe
mailing list