[GHC] #9320: Inlining regression/strangeness in 7.8

GHC ghc-devs at haskell.org
Wed Jul 16 07:17:25 UTC 2014


#9320: Inlining regression/strangeness in 7.8
-------------------------------------+-------------------------------------
        Reporter:  dolio             |                   Owner:
            Type:  bug               |                  Status:  new
        Priority:  normal            |               Milestone:
       Component:  Compiler          |                 Version:  7.8.2
        Keywords:                    |  Differential Revisions:
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |              Difficulty:  Unknown
       Test Case:                    |              Blocked By:
        Blocking:                    |         Related Tickets:
-------------------------------------+-------------------------------------
 A couple days ago, it was reported to me that vector-algorithms had a
 significant performance regression (~20x) on GHC 7.8.2. The problem stems
 from a lack of inlining and specialization of some of the functions that
 were previously handled in 7.6 and earlier. The following is a reduced
 test case (the vector and primitive packages are required):

 {{{
 module A (test) where

 import Control.Monad.ST
 import Control.Monad
 import Control.Monad.Primitive
 import Data.Vector.Generic.Mutable as U

 test :: (PrimMonad m, MVector v a, Num a) => Int -> v (PrimState m) a -> m
 a
 -- test :: (MVector v a, Num a) => Int -> v s a -> ST s a
 test 0 v = liftM (+1) $ unsafeRead v 0
 test n v = do
   long1 v
   test (n-1) v
 {-# INLINABLE test #-}

 long1, long2, long3, long4 :: (PrimMonad m, MVector v a) => v (PrimState
 m) a -> m ()

 long1 v = long2 v >> long2 v >> long2 v >> long2 v
 long2 v = long3 v >> long3 v >> long3 v >> long3 v
 long3 v = long4 v >> long4 v >> long4 v >> long4 v
 long4 v = unsafeRead v 0 >>= unsafeWrite v 0

 {-# INLINE long1 #-}
 {-# INLINE long2 #-}
 {-# INLINE long3 #-}
 {-# INLINE long4 #-}
 }}}

 {{{
 module Main (main) where

 import Control.Monad.ST
 import Data.Vector.Unboxed.Mutable as U hiding (read)
 import System.Environment
 import Unsafe.Coerce
 import GHC.Prim

 import A

 test0 :: Int -> MVector s Int -> ST s Int
 test0 n v = test n v
 {-# NOINLINE test0 #-}

 test1' :: Int -> MVector Any Int -> ST Any Int
 test1' n v = test n v
 {-# NOINLINE test1 #-}

 test1 :: Int -> MVector a Int -> ST a Int
 test1 = unsafeCoerce test1'

 main = getArgs >>= \(n:b:_) ->
   print $ runST $ do
     v <- new 1
     write v 0 0
     (if read b then test0 else test1) (read n) v
 }}}

 Module `A` exports a single function, `test`. This function is engineered
 to be quite large, by inlining several other functions into it, and it is
 itself marked INLINABLE. Then the `Main` module uses this function in two
 different ways:

 * `test0` uses `test` at a type that is compatible with `runST`
 * `test1'` uses `test` at a completely monomorphic type, which is then
 coerced to a `runST` compatible type in `test1`

 On 7.6 I believe (though have not checked) that there will be little or no
 performance difference between `test0` and `test1`. However, on 7.8.2
 (and, I have been assured, 7.8.3), there is a massive speed pentalty for
 `test0`; about 70x on my machine. This seems to be due to no inining or
 specialization for its use of `test`, which can be seen from `-ddump-
 simpl`.

 However, if one changes the type of `test` in `A` to be specific to `ST s`
 rather than using `PrimMonad`, there is no performance difference, even on
 7.8.2. So, the choice to inline and specialize seems to hinge on the
 instantiation of all the class constraints to monomorphic types containing
 no variables, rather than just types that resolve all overloading. I
 myself did not notice this problem, because my benchmark suite uses `IO`,
 which is a concrete instantiation of the type, and doesn't exhibit this
 problem.

 I have temporarily 'fixed' vector-algorithms by moving back to `INLINE`
 pragmas, but `INLINABLE` is actually preferable in that case, because it
 generates faster code than `INLINE` when the optimizations actually fire.
 My test case here does not illustrate that well, though.

 Is it safe to assume that this was not an intentional change? It's a
 rather weird rule (to me) if it was.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9320>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list