[GHC] #15999: Stabilise nofib runtime measurements

Thu Dec 13 10:18:02 UTC 2018

#15999: Stabilise nofib runtime measurements
-------------------------------------+-------------------------------------
        Reporter:  sgraf             |                Owner:  (none)
            Type:  task              |               Status:  new
        Priority:  normal            |            Milestone:  ⊥
       Component:  NoFib benchmark   |              Version:  8.6.2
  suite                              |
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #5793 #9476       |  Differential Rev(s):  Phab:D5438
  #15333 #15357                      |
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by sgraf):

 Replying to [comment:6 simonmar]:
 > So as I understand it, the GC "wibbles" you're talking about are caused
 by the number of GCs we run? Making a small change to the nursery size can
 make the difference between N and N+1 GC runs, which could be a large
 difference in runtime.

 Yes, that's one source of wibble (in hindsight, that may have been a bad
 term to use here). But it's not exactly the reason why I'm doing this:
 Have a look at the numbers in
 https://ghc.haskell.org/trac/ghc/ticket/9476#comment:55. The `./default`
 had significantly fewer Gen 0 collections and the same number of Gen 1
 collections as `./allow-cg` (which produces more garbage but is faster in
 total). Gen 1 collections where cheaper for `./allow-cg` for some reason.
 Also note how this correlates with the productivity rate: 10% vs 15% for
 the latter. The findings in the thread led me to plot the above curves.

 >
 > You're only looking at `-G1`, right? Generational GC often has weird
 effects based on the timing of when a GC runs. I think there will still be
 issues when there's an old-gen collection right around the end of the
 program run - making a small change may mean the difference between
 running or not running the expensive GC.

 This is not `-G1` and I agree that a single old-gen collection might make
 the difference. But when we modify the program in a way that there are
 ''more'' Gen 1 collections, at more uniformly distributed points in the
 program, I argue we will have a much better experience comparing nofib
 numbers. There are multiple ways to achieve this, but I think the simplest
 one is what I outline above and more closely corresponds to the workload
 of real applications (e.g. long running time, growing and shrinking
 working sets).

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15999#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler