[GHC] #10371: GHC fails to inline and specialize a function

GHC ghc-devs at haskell.org
Fri May 1 23:40:11 UTC 2015


#10371: GHC fails to inline and specialize a function
-------------------------------------+-------------------------------------
              Reporter:              |             Owner:
  MikeIzbicki                        |            Status:  new
                  Type:  bug         |         Milestone:
              Priority:  normal      |           Version:  7.10.1
             Component:  Compiler    |  Operating System:  Unknown/Multiple
              Keywords:              |   Type of failure:  None/Unknown
          Architecture:              |        Blocked By:
  Unknown/Multiple                   |   Related Tickets:
             Test Case:              |
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------
 I have an alternative Prelude library called
 [subhask](https://github.com/mikeizbicki/subhask) that redefines the
 numeric type class hierarchy.  I'm trying to update it to work with GHC
 7.10, but there is a major inlining bug that is killing performance.

 The code below demonstrates the issue.  It first defines a distance
 function over 2 vectors, then measures the performance using criterion.
 (It requires the subhask to compile.)

 {{{
 {-# LANGUAGE BangPatterns #-}

 import Control.DeepSeq
 import Criterion.Main
 import qualified Data.Vector.Unboxed as VU
 import qualified Data.Vector.Generic as VG

 import qualified Prelude
 import SubHask

 -- distance_standalone :: VU.Vector Float -> VU.Vector Float -> Float
 distance_standalone v1 v2 = sqrt $ go 0 0
     where
         go !tot !i =  if i>VG.length v1-4
             then goEach tot i
             else go tot' (i+4)
             where
                 tot' = tot
                     +(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i)
                     *(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i)
                     +(v1 `VG.unsafeIndex` (i+1)-v2 `VG.unsafeIndex` (i+1))
                     *(v1 `VG.unsafeIndex` (i+1)-v2 `VG.unsafeIndex` (i+1))
                     +(v1 `VG.unsafeIndex` (i+2)-v2 `VG.unsafeIndex` (i+2))
                     *(v1 `VG.unsafeIndex` (i+2)-v2 `VG.unsafeIndex` (i+2))
                     +(v1 `VG.unsafeIndex` (i+3)-v2 `VG.unsafeIndex` (i+3))
                     *(v1 `VG.unsafeIndex` (i+3)-v2 `VG.unsafeIndex` (i+3))

         goEach !tot !i = if i>= VG.length v1
             then tot
             else goEach tot' (i+1)
             where
                 tot' = tot+(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i)
                           *(v1 `VG.unsafeIndex` i-v2 `VG.unsafeIndex` i)


 main = do
     let v1 = VU.fromList [1..200] :: VU.Vector Float
         v2 = VU.fromList [1..200] :: VU.Vector Float

     deepseq v1 $ deepseq v2 $ return ()

     defaultMain
         [ bench "distance_standalone" $ nf (distance_standalone v1) v2
         ]

 }}}

 Here are the results of compiling and running using GHC 7.10 and 7.8:

 {{{
 $ ghc-7.10.1 Main.hs -O2 -fforce-recomp -ddump-to-file -ddump-simpl &&
 ./Main
 [1 of 1] Compiling Main             ( Main.hs, Main.o )
 Linking Main ...
 benchmarking distance_standalone
 time                 8.135 μs   (8.121 μs .. 8.154 μs)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 8.188 μs   (8.158 μs .. 8.250 μs)
 std dev              139.3 ns   (66.05 ns .. 250.4 ns)
 variance introduced by outliers: 15% (moderately inflated)
 }}}

 {{{
 $ ghc-7.8.2 Main.hs -O2 -fforce-recomp -ddump-to-file -ddump-simpl &&
 ./Main
 [1 of 1] Compiling Main             ( Main.hs, Main.o )
 Linking Main ...
 benchmarking distance_standalone
 time                 733.2 ns   (732.9 ns .. 733.6 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 734.1 ns   (733.7 ns .. 734.5 ns)
 std dev              1.458 ns   (1.262 ns .. 1.754 ns)
 }}}

 As you can see, GHC 7.10 is 10x slower.  Looking through the core output
 shows that the cause of this is that GHC 7.8 is properly specializing the
 code whereas GHC 7.10 is not.  If you uncomment the type signature before
 the `distance_standalone` function then both compilers perform at the
 faster speed.

 I believe the cause of this may be related to the complicated class
 numeric class hierarchy in SubHask.  If you comment out the lines:

 {{{
 import qualified Prelude
 import SubHask
 }}}

 then GHC uses the Prelude hierarchy instead of SubHask's hierarchy, and
 both compilers generate the faster program.

 There's one last wrinkle.  If you define the `distance_standalone`
 function in a different file.  Then in GHC 7.10, the `INLINE` and
 `INLINABLE` pragmas do absolutely nothing.  Not only does the resulting
 code not get inlined, but if I add the specialization:
 {{{
 {-# SPECIALIZE distance_standalone :: VU.Vector Float -> VU.Vector Float
 -> Float #-}
 }}}
 to the Main file, I get an error message saying something like:
 {{{
 bench/Vector.hs:18:1: Warning:
     You cannot SPECIALISE ‘distance_standalone{v ru1}’
       because its definition has no INLINE/INLINABLE pragma
       (or its defining module
 ‘subhask-0.1.0.0 at subha_LNZiQvSbo8Z0VdLTwuvkrN:SubHask.Algebra’
        was compiled without -O)
 }}}
 I get this message despite the fact that the defining module was compiled
 with `-O2` and the function had an `INLINABLE` pragma.  If I add the
 specialization pragma to the defining module, then I don't get the
 warning, but the code still doesn't specialize properly.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10371>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list