[GHC] #14208: Performance with O0 is much better than the default or with -O2, runghc performs the best
GHC
ghc-devs at haskell.org
Tue Mar 27 14:06:33 UTC 2018
#14208: Performance with O0 is much better than the default or with -O2, runghc
performs the best
-------------------------------------+-------------------------------------
Reporter: harendra | Owner: osa1
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by osa1):
> Do you know why the single-module case gets better? I suspect it may be
that toList is specialised.
Yes, as also said in my previous comment, the reason is `toList` being
specialised only when it's in the same module.
Multi-module
{{{
Main.len1
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.Types.Int #)
[GblId, Arity=1, Str=<S,U>, Unf=OtherCon []] =
[] \r [s_s5yc]
case Main.len_go 1# of sat_s5yd {
__DEFAULT ->
case List.toList GHC.Base.$fMonadIO sat_s5yd s_s5yc of {
(#,#) ipv_s5yf [Occ=Once] ipv1_s5yg [Occ=Once] ->
let {
sat_s5yi [Occ=Once] :: GHC.Types.Int
[LclId] =
[ipv1_s5yg] \u []
case GHC.List.$wlenAcc ipv1_s5yg 0# of
ww2_s5yh {
__DEFAULT -> GHC.Types.I# [ww2_s5yh];
};
} in (#,#) [ipv_s5yf sat_s5yi];
};
};
-- toList in another module
List.toList [Occ=LoopBreaker]
:: forall (m :: * -> *) a. GHC.Base.Monad m => List.List a -> m [a]
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=<S,U(A,C(C1(U)),A,C(U),A)><S,1*U>,
Unf=OtherCon []] =
[] \r [$dMonad_s36Z m1_s370]
case m1_s370 of {
List.Stop -> GHC.Base.return $dMonad_s36Z GHC.Types.[];
List.Yield a1_s372 [Occ=OnceL] r_s373 [Occ=Once] ->
let {
sat_s377 [Occ=Once] :: [a_a1fh] -> m_a1fg [a_a1fh]
[LclId] =
[$dMonad_s36Z a1_s372] \r [x1_s375]
let {
sat_s376 [Occ=Once] :: [a_a1fh]
[LclId] =
CCCS :! [a1_s372 x1_s375];
} in GHC.Base.return $dMonad_s36Z sat_s376; } in
let {
sat_s374 [Occ=Once] :: m_a1fg [a_a1fh]
[LclId] =
[$dMonad_s36Z r_s373] \u [] List.toList $dMonad_s36Z
r_s373;
} in GHC.Base.>>= $dMonad_s36Z sat_s374 sat_s377;
};
}}}
Single module:
{{{
Main.len1
:: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.Types.Int #)
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,U>, Unf=OtherCon []] =
[] \r [s_s5uU]
case Main.len_go 1# of sat_s5uV {
__DEFAULT ->
case Main.len2 sat_s5uV s_s5uU of {
(#,#) ipv_s5uX [Occ=Once] ipv1_s5uY [Occ=Once] ->
let {
sat_s5v0 [Occ=Once] :: GHC.Types.Int
[LclId] =
[ipv1_s5uY] \u []
case GHC.List.$wlenAcc ipv1_s5uY 0# of
ww2_s5uZ {
__DEFAULT -> GHC.Types.I# [ww2_s5uZ];
};
} in (#,#) [ipv_s5uX sat_s5v0];
};
};
Main.len2 [Occ=LoopBreaker]
:: forall a.
Main.List a
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, [a] #)
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=<S,1*U><S,U>,
Unf=OtherCon []] =
[] \r [m_s5uw eta_s5ux]
case m_s5uw of {
Main.Stop -> (#,#) [eta_s5ux GHC.Types.[]];
Main.Yield a1_s5uz [Occ=Once] r_s5uA [Occ=Once] ->
case Main.len2 r_s5uA eta_s5ux of {
(#,#) ipv_s5uC [Occ=Once] ipv1_s5uD [Occ=Once] ->
let {
sat_s5uE [Occ=Once] :: [a_X1Bw]
[LclId] =
CCCS :! [a1_s5uz ipv1_s5uD];
} in (#,#) [ipv_s5uC sat_s5uE];
};
};
}}}
> If you add {-# INLINABLE toList #-} does the difference go away?
With `INLINE toList` or `INLINABLE toList` multi-module and single-module
become identical with -O1 and -O2. (there are some small changes in the
STG outputs but nothing that changes runtime or allocations)
> Can you reproduce this difference? It is indeed puzzling!
I can, by running the criterion benchmark in GHCi:
{{{
$ ghc-stage2 --interactive Main.hs
GHCi, version 8.5.20180322: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /home/omer/rcbackup/.ghci
[1 of 2] Compiling List ( List.hs, interpreted )
[2 of 2] Compiling Main ( Main.hs, interpreted )
Ok, two modules loaded.
λ:1> main
benchmarking len
time 10.79 ms (10.68 ms .. 10.98 ms)
0.993 R² (0.979 R² .. 1.000 R²)
mean 10.66 ms (10.54 ms .. 10.98 ms)
std dev 480.8 μs (147.7 μs .. 952.1 μs)
variance introduced by outliers: 20% (moderately inflated)
λ:2>
Leaving GHCi.
$ ghc-stage2 --interactive Main.hs -DSINGLE_MODULE
GHCi, version 8.5.20180322: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /home/omer/rcbackup/.ghci
[1 of 1] Compiling Main ( Main.hs, interpreted )
Ok, one module loaded.
λ:1> main
benchmarking len
time 11.30 ms (11.20 ms .. 11.42 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 10.77 ms (10.64 ms .. 10.90 ms)
std dev 346.5 μs (317.7 μs .. 398.6 μs)
}}}
`-O1` and `-O2` with `-DSINGLE_MODULE` is faster than GHCi, but otherwise
GHCi is faster than the other 4 configurations.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14208#comment:29>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list