[GHC] #15999: Stabilise nofib runtime measurements

Fri Dec 21 13:42:40 UTC 2018

#15999: Stabilise nofib runtime measurements
-------------------------------------+-------------------------------------
        Reporter:  sgraf             |                Owner:  (none)
            Type:  task              |               Status:  new
        Priority:  normal            |            Milestone:  ⊥
       Component:  NoFib benchmark   |              Version:  8.6.2
  suite                              |
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #5793 #9476       |  Differential Rev(s):  Phab:D5438
  #15333 #15357                      |
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Description changed by sgraf:

Old description:

> With Phab:D4989 (cf. #15357) having hit `nofib` master, there are still
> many benchmarks that are unstable. I identified three causes for
> unstability in https://ghc.haskell.org/trac/ghc/ticket/5793#comment:38.
> With system overhead mostly out of the equation, there are still two
> related tasks left:
>
> 1. Identify benchmarks with GC wibbles. Plan: Look at counted
> instructions while varying heap size with just one generation. A wibbling
> benchmark should have quite diverse sampled maximum residency (as opposed
> to a microbenchmark, which should have quite stable instruction count).
>
>    Then fix these by iterating `main` 'often enough'. Maybe look at total
> bytes allocated for that, we want this to be monotonically declining as
> the initial heap size grows.
> 2. Now, all benchmarks should have stable instruction count. If not,
> maybe there's another class of benchmarks I didn't identify yet in #5793.
> Of these benchmarks, there are a few, like `real/eff/CS`, that still have
> highly unstable runtimes. Fix these 'microbenchmarks' by hiding them
> behind a flag.

New description:

 With Phab:D4989 (cf. #15357) having hit `nofib` master, there are still
 many benchmarks that are unstable in one way or another. I identified
 three causes for unstability in
 https://ghc.haskell.org/trac/ghc/ticket/5793#comment:38. With system
 overhead mostly out of the equation, there are still two related tasks
 left:

 1. Identify benchmarks with GC wibbles. Plan: Look at how productivity
 rate changes while increasing gen 0 heap size. A GC-sensitive benchmark
 should have a non-monotonic or discontinuous productivity-rate-over-
 nursery-size curve. Then fix these by iterating `main` often enough for
 the curve to become smooth and monotone.
 2. Now, all benchmarks should have monotonically decreasing instruction
 count for increasing nursery sizes. If not, maybe there's another class of
 benchmarks I didn't identify yet in #5793. Of these benchmarks, there are
 a few, like `real/eff/CS`, that still have highly code layout-sensitive
 runtimes. Fix these 'microbenchmarks' by hiding them behind a flag.

--

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15999#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler