[GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)
GHC
ghc-devs at haskell.org
Tue Sep 18 18:02:47 UTC 2018
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
Reporter: nh2 | Owner: davide
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by davide):
== Regarding simple example
`f :: forall a. (a ~ Int) => a -> a`, the difference in performance is
somewhat expected. This may be a different issue than the example given in
the ticket description. In short, `a ~ Int` is a proof that type `a` is
equal to type `Int`. In core, `a ~ Int` is a regular ''boxed'' GADT
meaning that it could be bottom i.e. an invalid prove (this is the main
mechanism behind
[https://downloads.haskell.org/~ghc/8.0.2/docs/html/users_guide/glasgow_exts.html?highlight=defered%20type%20error
#deferring-type-errors-to-runtime -fdefer-type-errors]). Unboxing `a ~ b`
at corresponds to checking the proof which is required to coerce the input
binding from `a` to `Int`. Normally the `(a ~ Int)` would be optimized
away (as described [http://dreixel.net/research/pdf/epdtecp.pdf here] in
section 7.3), but that requires a worker wrapper transformation that never
happens. Removing `NOINLINE` allows `f` to be optimized across modules,
which closes the performance gap.
== Regarding original example
Unlike my simple example, all the code is in one module, so I expect the
equality proof `VG.Mutable v ~ vm` to be optimized away (again see
[http://dreixel.net/research/pdf/epdtecp.pdf here] section 7.3). With ghc
3.2.2, when compiling the slow version, I see `selectVectorDestructive2`
is specialized to
`$sselectVectorDestructive2 :: Int -> Vector Int -> MVector (PrimState IO)
Int -> Int -> Int -> IO ()` (pass 2). This is good, but for some reason
myread and partitionLoop2 are not specialized even though they are used by
`$sselectVectorDestructive2`:
{{{#!haskell
$sselectVectorDestructive2 =
...
let
$dMVector =
Data.Vector.Generic.Base.$p1Vector
@Vector
@Int
Data.Vector.Unboxed.Base.$fVectorVectorInt
in
...
(Main.myread
@IO
@MVector
@Int
Control.Monad.Primitive.$fPrimMonadIO
$dMVector
GHC.Classes.$fOrdInt
GHC.Show.$fShowInt
v
begin)
...
(Main.partitionLoop2
@IO
@MVector
@Int
Control.Monad.Primitive.$fPrimMonadIO
$dMVector
GHC.Classes.$fOrdInt
GHC.Show.$fShowInt
v
begin
pivot
(GHC.Types.I# ...)
}}}
In the fast version, myread and partitionLoop2 are specialized in this
pass. I noticed 2 other differences:
* fast version floats `$dMVector` to a top level binding.
* fast version specializes to `Mutable Vector (PrimState IO) Int` instead
of `MVector (PrimState IO) Int`. Note `Mutable` is a type family and
`Mutable Vector = MVector`
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list