[GHC] #11783: Very large slowdown when using parallel garbage collector

GHC ghc-devs at haskell.org
Sat Apr 2 20:46:20 UTC 2016


#11783: Very large slowdown when using parallel garbage collector
-------------------------------------+-------------------------------------
        Reporter:  luispedro         |                Owner:
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Runtime System    |              Version:  7.10.3
      Resolution:                    |             Keywords:  performance,
                                     |  garbage collector
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Description changed by luispedro:

@@ -11,1 +11,15 @@
- simple test case
+ simple test case.
+
+ On some machines it seems worse than others and it seems that the input
+ file (data.txt) needs to be quite large for the problem to really show up
+ (the attached script generates a 16 million input file, this is still
+ smaller than some of my real use cases, but I couldn't trigger it with
+ only 1 million). Similarly, with 4 threads, the slowdown is detectable,
+ but not as large.
+
+ While running, CPU usage is very high (I tested with 16 threads and it
+ uses 16 CPUs continuously, top reports 1600% CPU).
+
+ Using '+RTS -A64m' is another way around the issue, but for the full
+ application it is still not as effective as '+RTS -qg', so there still
+ seems to be a performance issue here.

New description:

 As part of debugging some performance issues on an application I am
 writing, I concluded that the issue is in the parallel GC implemented in
 the GHC RTS. I extracted the code attached to make a self-contained use-
 case, but in my system the code runs in 16s when using a single thread, in
 18s when using 6 threads but no parallel GC and in over a minute when
 using 6 threads with parallel GC!

 The true slowdown in the full code is actually worse and relevant for the
 application (some steps take >1 hour instead of <1 minute!). Parts of the
 code do take full advantage of parallel processing, this is just one
 simple test case.

 On some machines it seems worse than others and it seems that the input
 file (data.txt) needs to be quite large for the problem to really show up
 (the attached script generates a 16 million input file, this is still
 smaller than some of my real use cases, but I couldn't trigger it with
 only 1 million). Similarly, with 4 threads, the slowdown is detectable,
 but not as large.

 While running, CPU usage is very high (I tested with 16 threads and it
 uses 16 CPUs continuously, top reports 1600% CPU).

 Using '+RTS -A64m' is another way around the issue, but for the full
 application it is still not as effective as '+RTS -qg', so there still
 seems to be a performance issue here.

--

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/11783#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list