[GHC] #11783: Very large slowdown when using parallel garbage collector
GHC
ghc-devs at haskell.org
Sat Apr 2 20:46:20 UTC 2016
#11783: Very large slowdown when using parallel garbage collector
-------------------------------------+-------------------------------------
Reporter: luispedro | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Runtime System | Version: 7.10.3
Resolution: | Keywords: performance,
| garbage collector
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Description changed by luispedro:
@@ -11,1 +11,15 @@
- simple test case
+ simple test case.
+
+ On some machines it seems worse than others and it seems that the input
+ file (data.txt) needs to be quite large for the problem to really show up
+ (the attached script generates a 16 million input file, this is still
+ smaller than some of my real use cases, but I couldn't trigger it with
+ only 1 million). Similarly, with 4 threads, the slowdown is detectable,
+ but not as large.
+
+ While running, CPU usage is very high (I tested with 16 threads and it
+ uses 16 CPUs continuously, top reports 1600% CPU).
+
+ Using '+RTS -A64m' is another way around the issue, but for the full
+ application it is still not as effective as '+RTS -qg', so there still
+ seems to be a performance issue here.
New description:
As part of debugging some performance issues on an application I am
writing, I concluded that the issue is in the parallel GC implemented in
the GHC RTS. I extracted the code attached to make a self-contained use-
case, but in my system the code runs in 16s when using a single thread, in
18s when using 6 threads but no parallel GC and in over a minute when
using 6 threads with parallel GC!
The true slowdown in the full code is actually worse and relevant for the
application (some steps take >1 hour instead of <1 minute!). Parts of the
code do take full advantage of parallel processing, this is just one
simple test case.
On some machines it seems worse than others and it seems that the input
file (data.txt) needs to be quite large for the problem to really show up
(the attached script generates a 16 million input file, this is still
smaller than some of my real use cases, but I couldn't trigger it with
only 1 million). Similarly, with 4 threads, the slowdown is detectable,
but not as large.
While running, CPU usage is very high (I tested with 16 threads and it
uses 16 CPUs continuously, top reports 1600% CPU).
Using '+RTS -A64m' is another way around the issue, but for the full
application it is still not as effective as '+RTS -qg', so there still
seems to be a performance issue here.
--
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/11783#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list