[GHC] #14208: Performance with O0 is much better than the default or with -O2, runghc performs the best

Wed Sep 20 10:00:13 UTC 2017

#14208: Performance with O0 is much better than the default or with -O2, runghc
performs the best
-------------------------------------+-------------------------------------
        Reporter:  harendra          |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by harendra):

 It seems, in my repo the cabal file is building the library without any
 optimization flag and in the different benchmark runs the flag is being
 changed only on the benchmark module and not on the library. So there is a
 mixup of optimization flags. Here is the new matrix taking that into
 account:

 {{{
 Main.hs       List.hs               INLINE   Default
 ----------------------------------------------------
              Identical Flags
 ----------------------------------------------------

 -O1               -O1               4.6 ms  14.2 ms
 -O0               -O0               14.2 ms 14.2 ms
 -fno-pre-inlining -fno-pre-inlining 4.6 ms  9.9 ms

 ----------------------------------------------------
              Mixed Flags
 ----------------------------------------------------

 -fno-pre-inlining -O1               4.6 ms  8.8 ms
 -O0               -O1               8.8 ms  8.8 ms

 ----------------------------------------------------
              runghc
 ----------------------------------------------------

 runghc           -O0                5.2 ms  5.2 ms
 runghc           -O1                4.7 ms  4.7 ms
 }}}

 Observations:

 1. When `toList` is INLINEd the results are more or less expected. Simon,
 what you are seeing is the INLINE column with identical flags.
 2. In the default case (no pragmas are used) `-fno-pre-inlining` does
 better than `-O1` and runghc seems to be doing well irrespective of the
 flag used to build the library (i.e. List.hs). Does it mean that `-O1` can
 also do better in this case?
 3. Mixing up the optimization flags brings one more variable in the
 picture. I would like to ignore those cases. What does GHC recommend? If
 this is not recommended, is there a way to warn the user when the flags
 are mixed up? If not, will it be possible to implement something like
 that?
 4. In my original package I am still seeing `-O0` as well as `runghc`
 doing much better than `-O2`, even when using INLINE pragmas and identical
 optimization options for all code. I guess, I need to work again to get a
 simplified example keeping these in mind.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14208#comment:24>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler