[GHC] #8096: Add fudge-factor for performance tests run on non-validate builds

Sat Jul 27 07:47:44 CEST 2013

#8096: Add fudge-factor for performance tests run on non-validate builds
------------------------------------+-------------------------------------
       Reporter:  ezyang            |             Owner:
           Type:  task              |            Status:  new
       Priority:  normal            |         Milestone:
      Component:  Build System      |           Version:  7.7
       Keywords:                    |  Operating System:  Unknown/Multiple
   Architecture:  Unknown/Multiple  |   Type of failure:  None/Unknown
     Difficulty:  Unknown           |         Test Case:
     Blocked By:                    |          Blocking:
Related Tickets:                    |
------------------------------------+-------------------------------------
 Since I'm not going to get around to this immediately, Trac'ifying for
 posterity:

 These tests have been doing better than expected in the nightlies
 for some while.

 {{{
  Unexpected failures:
     perf/compiler  T3064 [stat too good] (normal)
     perf/compiler  T3294 [stat too good] (normal)
     perf/compiler  T5642 [stat too good] (normal)
     perf/haddock   haddock.Cabal [stat too good] (normal)
     perf/haddock   haddock.base [stat too good] (normal)
 }}}

 Unfortunately, fixing them is not a simple matter of shifting
 the ranges up, since the tests only exceed expectations on
 a /perf/ build, so on a normal build such as 'quick', these
 tests all pass normally.

 I could bump up the upper bounds so that the builder stops bleating
 about them; perhaps we could do something more complicated where the
 expected performance depends on what level of optimization GHC was built
 with (but I don't know how to implement this.)

 ----

 The problem with just widening the bounds to cover 2 different types of
 build is that it increases the chance that performance changes won't
 actually be noticed by thge person responsible.

 Having different bounds for different build configurations is a pain,
 because (a) the testsuite has to work out which set of bounds to use,
 and (b) you now have even more wobbly values to keep up-to-date.

 I think perhaps the best thing would be to add some sort of (per-test?)
 fudge factor for non-validate builds. That way validate will still find
 performance regressions, like it does today, but other builds are less
 likely to give false positives. (Igloo)

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8096>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler