[GHC] #14208: Performance with O0 is much better than the default or with -O2, runghc performs the best
GHC
ghc-devs at haskell.org
Tue Mar 27 21:57:12 UTC 2018
#14208: Performance with O0 is much better than the default or with -O2, runghc
performs the best
-------------------------------------+-------------------------------------
Reporter: harendra | Owner: osa1
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by harendra):
In case you need another data point, my original streaming library that
made me file this issue still exhibits the same behavior. GHCi is 6x
faster than my regular compiled code. I tried even compiling everything
including all dependencies with exactly the same optimization flags to
make sure there is no funny business due to mixing of opt flags. You can
see the behavior in the tree available on github here:
https://github.com/composewell/streamly/tree/199e20dd4b62ac2dafea0a40dc2ce3d97c307542
You can clone the repo and run the experiment like this:
{{{
$ stack bench
benchmarked streaming ops
time 34.39 ms (32.99 ms .. 35.67 ms)
0.995 R² (0.991 R² .. 0.998 R²)
mean 33.97 ms (33.24 ms .. 35.43 ms)
$ stack runghc benchmark/Main.hs
benchmarked streaming ops
time 6.215 ms (5.684 ms .. 6.860 ms)
0.945 R² (0.896 R² .. 0.978 R²)
mean 6.610 ms (6.333 ms .. 6.991 ms)
}}}
If I change the optimization flags to -O0 for benchmark stanza in cabal
file I can get close to ghci performance. The code that I am benchmarking
is like this:
{{{
{-# INLINE streamlyOp #-}
streamlyOp :: IO Int
streamlyOp = do
xs <- S.toList $ S.serially $
S.each [1..100000 :: Int]
& fmap (+1)
& fmap (+1)
& fmap (+1)
& fmap (+1)
return (Prelude.length xs)
}}}
It seems the problem is with the `fmap` operation (I may be wrong), it is
6 times slower in case of GHC, and every other fmap I add, the benchmark
timings increase but the ratio remains the same. I tried using an INLINE
on fmap, I also tried to SPECIALIZE it to IO and INT type but no change.
The `fmap` op is defined in `src/Streamly/Streams.hs` file like this:
{{{
instance Monad m => Functor (StreamT m) where
fmap f (StreamT (Stream m)) = StreamT $ Stream $ \_ stp yld ->
let yield a Nothing = yld (f a) Nothing
yield a (Just r) = yld (f a)
(Just (getStreamT (fmap f (StreamT r))))
in m Nothing stp yield
}}}
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14208#comment:31>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list