[GHC] #14208: Performance with O0 is much better than the default or with -O2, runghc performs the best

Tue Sep 12 16:07:28 UTC 2017

#14208: Performance with O0 is much better than the default or with -O2, runghc
performs the best
-------------------------------------+-------------------------------------
        Reporter:  harendra          |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by MikolajKonarski):

 Great job simplifying the example! Every bit of code eliminated helps GHC
 hackers immensely to tackle this (which may take a while anyway,
 especially given the busy time of year). Even if the contrived example
 looks nonsensical and silly. There is less noise, fewer suspects and the
 Core is less overwhelming.

 The combination of `-fexpose-all-unfoldings` and `-fspecialise-
 aggressively` is close to (or even equivalent) to putting everything in a
 single module. See
 https://ghc.haskell.org/trac/ghc/ticket/12463#comment:19 and other
 comments in that thread. The only drawback is that GHC takes much more
 memory. A hack around that (at least before 8.2.1) is to restart GHC
 during compilation when it hogs too much memory (or travis_retry after
 out-of-memory segfault in travis scripts).

 If you see worse fusion behaviour `-O1` than `-O0`, I guess here is hope
 it can be fine-tuned in that particular case or even that it's a bug. I
 wonder who is hacking on the fusion machinery these days...

 But in general, inlining (especially of only a subset of functions) that
 makes performance worse is a fact of life, though GHC strives hard not to
 _automatically_ inline in such suspect cases. I wonder, if you marked
 `toList` NOINLINE and the Monoid methods INLINE, but put everything in the
 same module, would you still have the bad behaviour? That would hint that
 the (partial) inlining inhibits fusion and would show which combination of
 inline decisions is responsible (so that GHC may be improved for that
 combination or may be prevented from automatically generating such partial
 inlining).

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14208#comment:16>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler