[GHC] #14208: Performance with O0 is much better than the default or with -O2, runghc performs the best

GHC ghc-devs at haskell.org
Tue Mar 27 14:06:33 UTC 2018


#14208: Performance with O0 is much better than the default or with -O2, runghc
performs the best
-------------------------------------+-------------------------------------
        Reporter:  harendra          |                Owner:  osa1
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by osa1):

 > Do you know why the single-module case gets better? I suspect it may be
 that toList is specialised.

 Yes, as also said in my previous comment, the reason is `toList` being
 specialised only when it's in the same module.

 Multi-module

 {{{
 Main.len1
   :: GHC.Prim.State# GHC.Prim.RealWorld
      -> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.Types.Int #)
 [GblId, Arity=1, Str=<S,U>, Unf=OtherCon []] =
     [] \r [s_s5yc]
         case Main.len_go 1# of sat_s5yd {
           __DEFAULT ->
               case List.toList GHC.Base.$fMonadIO sat_s5yd s_s5yc of {
                 (#,#) ipv_s5yf [Occ=Once] ipv1_s5yg [Occ=Once] ->
                     let {
                       sat_s5yi [Occ=Once] :: GHC.Types.Int
                       [LclId] =
                           [ipv1_s5yg] \u []
                               case GHC.List.$wlenAcc ipv1_s5yg 0# of
 ww2_s5yh {
                                 __DEFAULT -> GHC.Types.I# [ww2_s5yh];
                               };
                     } in  (#,#) [ipv_s5yf sat_s5yi];
               };
         };

 -- toList in another module
 List.toList [Occ=LoopBreaker]
   :: forall (m :: * -> *) a. GHC.Base.Monad m => List.List a -> m [a]
 [GblId,
  Arity=2,
  Caf=NoCafRefs,
  Str=<S,U(A,C(C1(U)),A,C(U),A)><S,1*U>,
  Unf=OtherCon []] =
     [] \r [$dMonad_s36Z m1_s370]
         case m1_s370 of {
           List.Stop -> GHC.Base.return $dMonad_s36Z GHC.Types.[];
           List.Yield a1_s372 [Occ=OnceL] r_s373 [Occ=Once] ->
               let {
                 sat_s377 [Occ=Once] :: [a_a1fh] -> m_a1fg [a_a1fh]
                 [LclId] =
                     [$dMonad_s36Z a1_s372] \r [x1_s375]
                         let {
                           sat_s376 [Occ=Once] :: [a_a1fh]
                           [LclId] =
                               CCCS :! [a1_s372 x1_s375];
                         } in  GHC.Base.return $dMonad_s36Z sat_s376; } in
               let {
                 sat_s374 [Occ=Once] :: m_a1fg [a_a1fh]
                 [LclId] =
                     [$dMonad_s36Z r_s373] \u [] List.toList $dMonad_s36Z
 r_s373;
               } in  GHC.Base.>>= $dMonad_s36Z sat_s374 sat_s377;
         };
 }}}

 Single module:

 {{{
 Main.len1
   :: GHC.Prim.State# GHC.Prim.RealWorld
      -> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.Types.Int #)
 [GblId, Arity=1, Caf=NoCafRefs, Str=<S,U>, Unf=OtherCon []] =
     [] \r [s_s5uU]
         case Main.len_go 1# of sat_s5uV {
           __DEFAULT ->
               case Main.len2 sat_s5uV s_s5uU of {
                 (#,#) ipv_s5uX [Occ=Once] ipv1_s5uY [Occ=Once] ->
                     let {
                       sat_s5v0 [Occ=Once] :: GHC.Types.Int
                       [LclId] =
                           [ipv1_s5uY] \u []
                               case GHC.List.$wlenAcc ipv1_s5uY 0# of
 ww2_s5uZ {
                                 __DEFAULT -> GHC.Types.I# [ww2_s5uZ];
                               };
                     } in  (#,#) [ipv_s5uX sat_s5v0];
               };
         };

 Main.len2 [Occ=LoopBreaker]
   :: forall a.
      Main.List a
      -> GHC.Prim.State# GHC.Prim.RealWorld
      -> (# GHC.Prim.State# GHC.Prim.RealWorld, [a] #)
 [GblId,
  Arity=2,
  Caf=NoCafRefs,
  Str=<S,1*U><S,U>,
  Unf=OtherCon []] =
     [] \r [m_s5uw eta_s5ux]
         case m_s5uw of {
           Main.Stop -> (#,#) [eta_s5ux GHC.Types.[]];
           Main.Yield a1_s5uz [Occ=Once] r_s5uA [Occ=Once] ->
               case Main.len2 r_s5uA eta_s5ux of {
                 (#,#) ipv_s5uC [Occ=Once] ipv1_s5uD [Occ=Once] ->
                     let {
                       sat_s5uE [Occ=Once] :: [a_X1Bw]
                       [LclId] =
                           CCCS :! [a1_s5uz ipv1_s5uD];
                     } in  (#,#) [ipv_s5uC sat_s5uE];
               };
         };
 }}}

 > If you add {-# INLINABLE toList #-} does the difference go away?

 With `INLINE toList` or `INLINABLE toList` multi-module and single-module
 become identical with -O1 and -O2. (there are some small changes in the
 STG outputs but nothing that changes runtime or allocations)

 > Can you reproduce this difference? It is indeed puzzling!

 I can, by running the criterion benchmark in GHCi:

 {{{
 $ ghc-stage2 --interactive Main.hs
 GHCi, version 8.5.20180322: http://www.haskell.org/ghc/  :? for help
 Loaded GHCi configuration from /home/omer/rcbackup/.ghci
 [1 of 2] Compiling List             ( List.hs, interpreted )
 [2 of 2] Compiling Main             ( Main.hs, interpreted )
 Ok, two modules loaded.
 λ:1> main
 benchmarking len
 time                 10.79 ms   (10.68 ms .. 10.98 ms)
                      0.993 R²   (0.979 R² .. 1.000 R²)
 mean                 10.66 ms   (10.54 ms .. 10.98 ms)
 std dev              480.8 μs   (147.7 μs .. 952.1 μs)
 variance introduced by outliers: 20% (moderately inflated)

 λ:2>
 Leaving GHCi.

 $ ghc-stage2 --interactive Main.hs -DSINGLE_MODULE
 GHCi, version 8.5.20180322: http://www.haskell.org/ghc/  :? for help
 Loaded GHCi configuration from /home/omer/rcbackup/.ghci
 [1 of 1] Compiling Main             ( Main.hs, interpreted )
 Ok, one module loaded.
 λ:1> main
 benchmarking len
 time                 11.30 ms   (11.20 ms .. 11.42 ms)
                      0.999 R²   (0.998 R² .. 0.999 R²)
 mean                 10.77 ms   (10.64 ms .. 10.90 ms)
 std dev              346.5 μs   (317.7 μs .. 398.6 μs)
 }}}

 `-O1` and `-O2` with `-DSINGLE_MODULE` is faster than GHCi, but otherwise
 GHCi is faster than the other 4 configurations.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14208#comment:29>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list