[GHC] #14062: Pure syntax transformation affects performance.

Sun Jul 30 15:28:06 UTC 2017

#14062: Pure syntax transformation affects performance.
-------------------------------------+-------------------------------------
           Reporter:  danilo2        |             Owner:  (none)
               Type:  bug            |            Status:  new
           Priority:  high           |         Milestone:
          Component:  Compiler       |           Version:  8.0.1
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 Hi! Let's consider the following code (compiled with `-O2`, `mtl` and
 `criterion` needed):

 {{{#!hs
 module Main where

 import Prelude as
 import Criterion.Main
 import Control.Monad.State.Strict
 import Control.Monad.Identity

 repeatM :: Monad m => m a -> Int -> m ()
 repeatM f = go where
     go 0 = pure ()
     go i = f >> go (i - 1)
 {-# INLINE repeatM #-}

 incState :: MonadState Int m => m ()
 incState = modify' (1+) ; {-# INLINE incState #-}

 test1, test2 :: Int -> Int
 test1 = \n -> (runIdentity . flip evalStateT 0 . (\a -> repeatM incState a
 >> S.get)) n ; {-# INLINE test1 #-}
 test2 = \n -> runIdentity . flip evalStateT 0 $ repeatM incState n >> get
 ; {-# INLINE test2 #-}

 main :: IO ()
 main = do
     defaultMain
         [ bgroup "monad transformers overhead"
             [ bench "test1"     $ nf test1 100000000
             , bench "test2"     $ nf test2 100000000
             ]
         ]
 }}}

 Functions `test1` and `test2` differ only syntactically and this
 difference should not affect GHC's inliner, because their implementations
 use fully saturated calls. The generated core for `test1` and `test2` is
 practically identical (there is an additional alias created for `test1`:
 `test1 = lvl1_rhor 'cast' ...`).

 The problem is that `test1` runs **3 times faster** than `test2`.

 As a side note - if we add more state transformers to `test1`, it
 optimizes them all away, while `test2` runs slower with each new
 transformer applied.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14062>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler