[GHC] #14013: Bad monads performance
GHC
ghc-devs at haskell.org
Sat Jul 22 21:13:31 UTC 2017
#14013: Bad monads performance
-------------------------------------+-------------------------------------
Reporter: danilo2 | Owner: (none)
Type: bug | Status: new
Priority: high | Milestone:
Component: Compiler | Version: 8.2.1-rc3
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
Hi! We've been struggling with a very strange GHC behavior on IRC today.
Let's consider the following code (needs mtl and criterion to be compiled,
not all extensions are needed):
{{{
{-# LANGUAGE AllowAmbiguousTypes, ApplicativeDo, Arrows, BangPatterns,
BinaryLiterals,
ConstraintKinds, DataKinds, DefaultSignatures, DeriveDataTypeable,
DeriveFoldable,
DeriveFunctor, DeriveGeneric, DeriveTraversable, DoAndIfThenElse,
DuplicateRecordFields,
EmptyDataDecls, FlexibleContexts, FlexibleInstances,
FunctionalDependencies, GeneralizedNewtypeDeriving,
InstanceSigs, LambdaCase, MonadComprehensions, MultiWayIf,
NamedWildCards, NegativeLiterals,
NoImplicitPrelude, NumDecimals, OverloadedLabels, PackageImports,
QuasiQuotes, RankNTypes,
RecursiveDo, ScopedTypeVariables, StandaloneDeriving, TemplateHaskell,
TupleSections,
TypeApplications, TypeFamilies, TypeFamilyDependencies, TypeOperators,
ViewPatterns,
LiberalTypeSynonyms, RelaxedPolyRec #-}
module Main where
import Prelude
import Criterion.Main
import qualified Control.Monad.State.Strict as Strict
import qualified Control.Monad.State.Class as State
import Control.DeepSeq (NFData, rnf, force)
import GHC.IO (evaluate)
import Data.Monoid
-----------------------------
-- === Criterion utils === --
-----------------------------
eval :: NFData a => a -> IO a
eval = evaluate . force ; {-# INLINE eval #-}
liftExp :: (Int -> a) -> (Int -> a)
liftExp f = f . (10^) ; {-# INLINE liftExp #-}
expCodeGen :: NFData a => (Int -> a) -> (Int -> IO a)
expCodeGen f i = do
putStrLn $ "generating input code (10e" <> show i <> " chars)"
out <- eval $ liftExp f i
putStrLn "code generated sucessfully"
return out
{-# INLINE expCodeGen #-}
expCodeGenBench :: (NFData a, NFData b) => (Int -> a) -> (a -> b) -> Int
-> Benchmark
expCodeGenBench f p i = env (expCodeGen f i) $ bench ("10e" <> show i) .
nf p ; {-# INLINE expCodeGenBench #-}
-------------------------------
-- === (a*) list parsing === --
-------------------------------
genList_a :: Int -> [Char]
genList_a i = replicate i 'a' ; {-# INLINE genList_a #-}
pureListParser_a :: [Char] -> Bool
pureListParser_a = \case
'a':s -> pureListParser_a s
[] -> True
_ -> False
{-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool
mtlStateListParser_a = State.get >>= \case
'a':s -> State.put s >> mtlStateListParser_a
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a #-}
mtlStateListParser_a_typed :: Strict.State [Char] Bool
mtlStateListParser_a_typed = State.get >>= \case
'a':s -> State.put s >> mtlStateListParser_a_typed
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a_typed #-}
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
mtlStateListParser_a_let = go where
go = Strict.get >>= \case
'a':s -> Strict.put s >> go
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a_let #-}
{-# SPECIALIZE mtlStateListParser_a :: Strict.State [Char] Bool #-}
{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State [Char] Bool #-}
main = do
defaultMain
[ bgroup "a*" $
[ bgroup "pure" $ expCodeGenBench genList_a
pureListParser_a <$> [6..6]
, bgroup "mtl.State.Strict" $ expCodeGenBench genList_a
(Strict.evalState mtlStateListParser_a) <$> [6..6]
, bgroup "mtl.State.Strict typed" $ expCodeGenBench genList_a
(Strict.evalState mtlStateListParser_a_typed) <$> [6..6]
, bgroup "mtl.State.Strict let" $ expCodeGenBench genList_a
(Strict.evalState mtlStateListParser_a_let) <$> [6..6]
]
]
}}}
The code was compiled with following options (and many other variations):
`-threaded -funbox-strict-fields -O2 -fconstraint-solver-iterations=100
-funfolding-use-threshold=10000 -fexpose-all-unfoldings -fsimpl-tick-
factor=1000 -flate-dmd-anal`
Everything in this code has `INLINE` pragma. The important part we should
focus on are these two functions:
{{{
pureListParser_a :: [Char] -> Bool
pureListParser_a = \case
'a':s -> pureListParser_a s
[] -> True
_ -> False
{-# INLINE pureListParser_a #-}
mtlStateListParser_a :: State.MonadState [Char] m => m Bool
mtlStateListParser_a = State.get >>= \case
'a':s -> State.put s >> mtlStateListParser_a
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a #-}
}}}
Which are just "parsers" accepting strings containing only 'a' characters.
The former is pure one, while the later uses `State` to keep the remaining
input. The following list contains performance related observations:
0. For the rest of the points, let's call the performance of
`pureListParser_a` a "good" one and everything worse a "bad" one.
1. The performance of `mtlStateListParser_a` is bad, it runs 10 times
slower than `pureListParser_a`. Inspecting CORE we can observe that GHC
jumps between `(# a,b #)` and `(a,b)` representations all the time.
2. If we add a specialize pragma `{-# SPECIALIZE mtlStateListParser_a ::
Strict.State [Char] Bool #-}`, the performance of `mtlStateListParser_a`
is good (exactly the same as `pureListParser_a`).
3. If we do NOT use specialize pragma, but we use explicite, non-
polymorphic type signature `mtlStateListParser_a_typed :: Strict.State
[Char] Bool`, the performance is bad (!), identical to the polymorphic
version without specialization.
4. If we use SPECIALIZE pragma together with explicite, non-polymorphic
type, so we use BOTH `mtlStateListParser_a_typed :: Strict.State [Char]
Bool` AND `{-# SPECIALIZE mtlStateListParser_a_typed :: Strict.State
[Char] Bool #-}` we get the good performance.
5. If we transform `pureListParser_a` to
{{{
mtlStateListParser_a_let :: Strict.MonadState [Char] m => m Bool
mtlStateListParser_a_let = go where
go = Strict.get >>= \case
'a':s -> Strict.put s >> go
[] -> return True
_ -> return False
{-# INLINE mtlStateListParser_a_let #-}
}}}
we again get the good performance without the need to use `SPECIALIZE`
pragmas.
The above points raise the following questions:
1. Why GHC does not optimize `mtlStateListParser_a` the same way as
`pureListParser_a` and where the jumping between `(# a,b #)` and `(a,b)`
comes from?
2. Is there any way to tell GHC to automatically insert `SPECIALIZE`
pragmas, especially in performance critical code?
3. Why providing very-explicite type signature `mtlStateListParser_a_typed
:: Strict.State [Char] Bool` does not solve the problem unless we use
`SPECIALIZE` pragma that tells the same as the signature?
4. Why the trick to alias the body of recursive function to a local
variable `go` affects the performance in any way, especially when it does
NOT bring any variable to the local let scope?
We've been testing this behavior in GHC 8.0.2 and 8.2.1-rc3 and several
people reported exactly the same observations.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14013>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list