[GHC] #14013: Bad monads performance
GHC
ghc-devs at haskell.org
Sat Jul 22 22:02:03 UTC 2017
#14013: Bad monads performance
-------------------------------------+-------------------------------------
Reporter: danilo2 | Owner: (none)
Type: bug | Status: new
Priority: high | Milestone:
Component: Compiler | Version: 8.2.1-rc3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Description changed by danilo2:
Old description:
> Hi! We've been struggling with a very strange GHC behavior on IRC today.
> Let's consider the following code (needs mtl and criterion to be
> compiled):
>
> {{{
> module Main where
>
> import Prelude
> import Criterion.Main
> import qualified Control.Monad.State.Strict as Strict
> import qualified Control.Monad.State.Class as State
> import Control.DeepSeq (NFData, rnf, force)
> import GHC.IO (evaluate)
> import Data.Monoid
>
> -----------------------------
> -- === Criterion utils === --
> -----------------------------
>
> eval :: NFData a => a -> IO a
> eval = evaluate . force ; {-# INLINE eval #-}
>
> liftExp :: (Int -> a) -> (Int -> a)
> liftExp f = f . (10^) ; {-# INLINE liftExp #-}
>
> expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a)
> expCodeGen f i = do
> putStrLn $ "generating input code (10e" <> show i <> " chars)"
> out <- eval $ liftExp f i
> putStrLn "code generated sucessfully"
> return out
> {-# INLINE expCodeGen #-}
>
> expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int
> -> Benchmark
> expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) .
> nf p ; {-# INLINE expCodeGenBench #-}
>
> -------------------------------
> -- === (a*) list parsing === --
> -------------------------------
>
> genList_a :: Int -> [Char]
> genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
>
> pureListParser_a :: [Char] -> Bool
> pureListParser_a = \case
> 'a':s -> pureListParser_a s
> [] -> True
> _ -> False
> {-# INLINE pureListParser_a #-}
>
> mtlStateListParser_a :: State.MonadState [Char] m => m Bool
> mtlStateListParser_a = State.get >>= \case
> 'a':s -> State.put s >> mtlStateListParser_a
> [] -> return True
> _ -> return False
> {-# INLINE mtlStateListParser_a #-}
>
> mtlStateListParser_a_typed :: Strict.State [Char] Bool
> mtlStateListParser_a_typed = State.get >>= \case
> 'a':s -> State.put s >> mtlStateListParser_a_typed
> [] -> return True
> _ -> return False
> {-# INLINE mtlStateListParser_a_typed #-}
>
> mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
> mtlStateListParser_a_let = go where
> go = Strict.get >>= \case
> 'a':s -> Strict.put s >> go
> [] -> return True
> _ -> return False
> {-# INLINE mtlStateListParser_a_let #-}
>
> {-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}
> {-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
>
> main = do
> defaultMain
> [ bgroup "a*" $
> [ bgroup "pure" $ expCodeGenBench
> genList_a pureListParser_a <$> [6..6]
> , bgroup "mtl.State.Strict" $ expCodeGenBench
> genList_a (Strict.evalState mtlStateListParser_a) <$> [6..6]
> , bgroup "mtl.State.Strict typed" $ expCodeGenBench
> genList_a (Strict.evalState mtlStateListParser_a_typed) <$> [6..6]
> , bgroup "mtl.State.Strict let" $ expCodeGenBench
> genList_a (Strict.evalState mtlStateListParser_a_let) <$> [6..6]
> ]
> ]
>
> }}}
>
> The code was compiled with following options (and many other variations):
> `-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100
> -funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick-
> factor=1000 -flate-dmd-anal`
>
> Everything in this code has `INLINE` pragma. The important part we should
> focus on are these two functions:
>
> {{{
>
> pureListParser_a :: [Char] -> Bool
> pureListParser_a = \case
> 'a':s -> pureListParser_a s
> [] -> True
> _ -> False
> {-# INLINE pureListParser_a #-}
>
> mtlStateListParser_a :: State.MonadState [Char] m => m Bool
> mtlStateListParser_a = State.get >>= \case
> 'a':s -> State.put s >> mtlStateListParser_a
> [] -> return True
> _ -> return False
> {-# INLINE mtlStateListParser_a #-}
> }}}
>
> Which are just "parsers" accepting strings containing only 'a'
> characters. The former is pure one, while the later uses `State` to keep
> the remaining input. The following list contains performance related
> observations:
>
> 0. For the rest of the points, let's call the performance of
> `pureListParser_a` a "good" one and everything worse a "bad" one.
>
> 1. The performance of `mtlStateListParser_a` is bad, it runs 10 times
> slower than `pureListParser_a`. Inspecting CORE we can observe that GHC
> jumps between `(# a,b #)` and `(a,b)` representations all the time.
>
> 2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a ::
> Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a`
> is good (exactly the same as `pureListParser_a`).
>
> 3. If we do NOT use specialize pragma, but we use explicite, non-
> polymorphic type signature `mtlStateListParser_a_typed :: Strict.State
> [Char] Bool`, the performance is bad (!), identical to the polymorphic
> version without specialization.
>
> 4. If we use SPECIALIZE pragma together with explicite, non-polymorphic
> type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char]
> Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State
> [Char] Bool #-}` we get the good performance.
>
> 5. If we transform `pureListParser_a` to
>
> {{{
> mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
> mtlStateListParser_a_let = go where
> go = Strict.get >>= \case
> 'a':s -> Strict.put s >> go
> [] -> return True
> _ -> return False
> {-# INLINE mtlStateListParser_a_let #-}
> }}}
>
> we again get the good performance without the need to use `SPECIALIZE`
> pragmas.
>
> 6. The performance of all the functions that are not optimized as good as
> `pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2.
>
> 7. The not-yet documented flag `-fspecialise-aggressively` does NOT
> affect the results.
>
> The above points raise the following questions:
>
> 1. Why GHC does not optimize `mtlStateListParser_a` the same way as
> `pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)`
> comes from?
>
> 2. Is there any way to tell GHC to automatically insert `SPECIALIZE`
> pragmas, especially in performance critical code?
>
> 3. Why providing very-explicite type signature
> `mtlStateListParser_a_typed :: Strict.State [Char] Bool` does not solve
> the problem unless we use `SPECIALIZE` pragma that tells the same as the
> signature?
>
> 4. Why the trick to alias the body of recursive function to a local
> variable `go` affects the performance in any way, especially when it does
> NOT bring any variable to the local let scope?
>
> We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several
> people reported exactly the same observations.
New description:
Hi! We've been struggling with a very strange GHC behavior on IRC today.
Let's consider the following code (needs mtl and criterion to be
compiled):
{{{
module Main where
import Prelude
import Criterion.Main
import qualified Control.Monad.State.Strict as Strict
import qualified Control.Monad.State.Class as State
import Control.DeepSeq (NFData, rnf, force)
import GHC.IO (evaluate)
import Data.Monoid
-----------------------------
-- === Criterion utils === --
-----------------------------
eval :: NFData a => a -> IO a
eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a)
liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a)
expCodeGen f i = do
putStrLn $ "generating input code (10e" <> show i <> " chars)"
out <- eval $ liftExp f i
putStrLn "code generated sucessfully"
return out
{-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int
-> Benchmark
expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) .
nf p ; {-# INLINE expCodeGenBench #-}
-------------------------------
-- === (a*) list parsing === --
-------------------------------
genList_a :: Int -> [Char]
genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool
pureListParser_a = \case
'a':s -> pureListParser_a s
[] -> True
_ -> False
{-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool
mtlStateListParser_a = State.get >>= \case
'a':s -> State.put s >> mtlStateListParser_a
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool
mtlStateListParser_a_typed = State.get >>= \case
'a':s -> State.put s >> mtlStateListParser_a_typed
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
mtlStateListParser_a_let = go where
go = Strict.get >>= \case
'a':s -> Strict.put s >> go
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}
{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do
defaultMain
[ bgroup "a*" $
[ bgroup "pure" $ expCodeGenBench genList_a
pureListParser_a <$> [6..6]
, bgroup "mtl.State.Strict" $ expCodeGenBench genList_a
(Strict.evalState mtlStateListParser_a) <$> [6..6]
, bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a
(Strict.evalState mtlStateListParser_a_typed) <$> [6..6]
, bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a
(Strict.evalState mtlStateListParser_a_let) <$> [6..6]
]
]
}}}
The code was compiled with following options (and many other variations):
`-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100
-funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick-
factor=1000 -flate-dmd-anal`
Everything in this code has `INLINE` pragma. The important part we should
focus on are these two functions:
{{{
pureListParser_a :: [Char] -> Bool
pureListParser_a = \case
'a':s -> pureListParser_a s
[] -> True
_ -> False
{-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool
mtlStateListParser_a = State.get >>= \case
'a':s -> State.put s >> mtlStateListParser_a
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a #-}
}}}
Which are just "parsers" accepting strings containing only 'a' characters.
The former is pure one, while the later uses `State` to keep the remaining
input. The following list contains performance related observations:
0. For the rest of the points, let's call the performance of
`pureListParser_a` a "good" one and everything worse a "bad" one.
1. The performance of `mtlStateListParser_a` is bad, it runs 10 times
slower than `pureListParser_a`. Inspecting CORE we can observe that GHC
jumps between `(# a,b #)` and `(a,b)` representations all the time.
2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a ::
Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a`
is good (exactly the same as `pureListParser_a`).
3. If we do NOT use specialize pragma, but we use explicite, non-
polymorphic type signature `mtlStateListParser_a_typed :: Strict.State
[Char] Bool`, the performance is bad (!), identical to the polymorphic
version without specialization.
4. If we use SPECIALIZE pragma together with explicite, non-polymorphic
type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char]
Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State
[Char] Bool #-}` we get the good performance.
5. If we transform `pureListParser_a` to
{{{
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
mtlStateListParser_a_let = go where
go = Strict.get >>= \case
'a':s -> Strict.put s >> go
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a_let #-}
}}}
we again get the good performance without the need to use `SPECIALIZE`
pragmas.
6. The performance of all the functions that are not optimized as good as
`pureListParser_a` is a lot worse in GHC 8.2.1-rc3 than in 8.0.2.
7. The not-yet documented flag `-fspecialise-aggressively` does NOT affect
the results.
The above points raise the following questions:
1. Why GHC does not optimize `mtlStateListParser_a` the same way as
`pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)`
comes from?
2. Is there any way to tell GHC to automatically insert `SPECIALIZE`
pragmas, especially in performance critical code?
3. Why providing very-explicite type signature `mtlStateListParser_a_typed
:: Strict.State [Char] Bool` does not solve the problem unless we use
`SPECIALIZE` pragma that tells the same as the signature? (GHC even warns:
`SPECIALISE pragma for non-overloaded function
‘mtlStateListParser_a_typed’` but it affects performance.)
4. Why the trick to alias the body of recursive function to a local
variable `go` affects the performance in any way, especially when it does
NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several
people reported exactly the same observations.
--
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14013#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list