[Haskell-cafe] Weird multi-threading runtime behaviour of single-threaded program with GHC-7.0.3

Simon Marlow marlowsd at gmail.com
Fri Jun 17 10:05:46 CEST 2011

I found this email thanks to the Parallel Haskell Digest, thanks Eric & 

On 01/04/2011 11:37, Herbert Valerio Riedel wrote:
> I'm experiencing an unexpected behaviour with GHC-7.0.3/x86_64's
> multithreading runtime when running some more demanding single-thread
> computations.
> I've isolated the part of the larger application which exposes the
> problem down to the parsing of a ~8MiB JSON file which causes quite a
> bit of memory allocations (source at end of email). The program is
> compiled with
> ghc --make -O2 -threaded -eventlog -rtsopts parse_aeson.hs
> and I ran it with and w/o the -N12 option on a 12-core machine (the
> problem is visible on a 4-core machine as well):
> $ time ./parse_aeson test.json
> 0.000392s after readFile
> 0.001016s parse was a success...
> 6.050836s NFed parse-tree...
> 6.050968s NFed parse-tree...
> no exception
> real    0m6.100s
> user    0m5.680s
> sys     0m0.419s
> $ time ./parse_aeson test.json +RTS -N12
> 0.001249s after readFile
> 0.001777s parse was a success...
> 10.048607s NFed parse-tree...
> 10.048738s NFed parse-tree...
> no exception
> real    0m10.125s
> user    0m57.496s
> sys     0m39.927s
> When observing top(1)'s output, I saw that all HECs consumed a
> substantial amount of cpu cycle, so I loaded up threadscope to see what
> the remaining 11 HECs were doing all the time.
> I've put the resulting eventlog files and screenshots of threadscope's
> visualization online at
> http://www.narf.at/~hvr/Haskell/parse_aeson.N0.eventlog.png
> http://www.narf.at/~hvr/Haskell/parse_aeson.N0.eventlog.xz
> http://www.narf.at/~hvr/Haskell/parse_aeson.N12.eventlog.png
> http://www.narf.at/~hvr/Haskell/parse_aeson.N12.eventlog.xz
> What's happening there? The actual processing work seems to be done in a
> single HEC... but what are the remaining 11 HECs doing exactly? Am I
> doing something wrong?

The answer is, they're all doing GC.  When you say -N, the parallel GC 
is turned on, so every GC requires the cooperation of all cores.  When 
you're running parallel code this is a big win, but for sequential code 
it could well be a slowdown.

You probably want to turn off parallel GC for the young generation with 
"+RTS -qg1".  If that doesn't help, try turning it off entirely: "+RTS -qg".

I have experimented with making this happen automatically, but so far 
haven't managed to find a heuristic that improves performance consistently.

> What does it mean when the GC-bars in threadscope becomes green as in
> the middle of the screenshot below?
> http://www.narf.at/~hvr/Haskell/parse_aeson.N4.eventlog.pdf

The green bit is where the GC is traversing the heap, as opposed to the 
other administrative tasks that the GC does (such as waiting for the 
other HECs to stop, dealing with weak pointers, freeing memory, etc.). 
The green bars let you see when one core is doing most of the work 
during parallel GC, which can be an indication that your data structures 
are mainly linear (like lists) rather than tree-like.


More information about the Haskell-Cafe mailing list