[GHC] #15176: Superclass `Monad m =>` makes program run 100 times slower
GHC
ghc-devs at haskell.org
Tue May 22 09:40:37 UTC 2018
#15176: Superclass `Monad m =>` makes program run 100 times slower
-------------------------------------+-------------------------------------
Reporter: danilo2 | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.6.1
Component: Compiler | Version: 8.4.2
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: Compile-time
Unknown/Multiple | performance bug
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
Hi! I've just encountered a very bizarre error.
=== General description ===
The code:
{{{
class LayersFoldableBuilder__ t (layers :: [Type]) m where
buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
instance Monad m => LayersFoldableBuilder__ t '[] m where
buildLayersFold__ = \_ a -> a
{-# INLINE buildLayersFold__ #-}
instance ( MonadIO m
, Storable.Storable (Layer.Cons l ())
, Layer.StorableLayer l m
, LayerFoldableBuilder__ (EnabledLayer t l) t m l
, LayersFoldableBuilder__ t ls m )
=> LayersFoldableBuilder__ t (l ': ls) m where
buildLayersFold__ = \ptr mr -> do
let fs = buildLayersFold__ @t @ls ptr'
ptr' = Ptr.plusPtr ptr $ Layer.byteSize @l
layerBuild__ @(EnabledLayer t l) @t @m @l ptr $! fs mr
{-# INLINE buildLayersFold__ #-}
}}}
This is a typeclass `LayersFoldableBuilder__` and ALL of its instances.
Please note, that every instance has a `Monad m` or `MonadIO m`
constraint. The program which uses this code heavily runs in 40ms. If we
only add constraint `Monad m =>` to the class definition:
{{{
class Monad m => LayersFoldableBuilder__ t (layers :: [Type]) m where
buildLayersFold__ :: SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
}}}
The program runs in 3.5s , which is almost 100 times slower.
Unfortunatelly I do not have minimal example, but it is reproducible. It
is a part of the Luna Language codebase: https://github.com/luna/luna-
core/blob/60bf6130691c23e52b97b067b52becb8fdb0c72e/core/src/Data/Graph/Traversal/Scoped.hs#L102
it was introduced in the commit 60bf6130691c23e52b97b067b52becb8fdb0c72e
on branch static-layers . However, building this is simple: stack bench
luna-core . After invoking it we see the described results.
=== Why its important and what should we do to fix it ===
1. I am writing this because I care about Haskell community. I want GHC
and Haskell to be widely used. Right now, the only thing I hear from all
companies around our company is that impredicative performance, even when
following rules "how to write efficient code" is the biggest pain people
have. Haskell is gathering attention - pure functional programming,
immutability etc - are great. But it will not become a popular choice
unless we care about predictive performance.
2. Such performance changes are unacceptable when thinking about Haskell
and GHC as production ready systems. We need a clear way how to write high
performance Haskell without the need to benchmark every part of our
programs even when refactoring things. GHC has enough information to
discover that we want high performance here and there (even by looking at
INLINE pragmas) and should warn us about lack of optimization. We should
also have a way to force GHC to apply optimizations in particular places -
for example by explicit marking code to be always specialized during
compilation, so GHC would never fall back to dict-passing in such places.
Such possibility would solve MANY related problems and user fears.
3. The point 2 also applies to strictness. In my opinion, having more
clear strictness resolution rules / tools is important. Right now the only
way to know if strictness analysis did a good job and we are not
constantly boxing / unboxing things is to read core, which is tedious and
99% of Haskell users do not even know how to do it (We've got 10 really,
really good Haskellers here, 2 of them are capable of reading core, but
not very fluently). I would love to chat more about these topics, because
they are crucial for growing Haskell community and making Haskell more
popular choice, which is waht we want, right? We don't want Haskell to be
just a research project with "all its users being its authors" at the same
time, am I
=== What happens in core ===
I inspected core and have found that indeed, after adding the constraint,
GHC does not apply all optimizations to the definitions. To be honest, I
completely don't understand it, because the code uses everywhere explicit
`INLINE` pragma to be sure everything is optimized away in the compilation
stage:
{{{
--------------------------------------------------------------------------------
SLOW, without Monad m =>
--------------------------------------------------------------------------------
-- RHS size: {terms: 5, types: 12, coercions: 4, joins: 0/0}
buildLayersFold__ [InlPrag=INLINE]
:: forall t (layers :: [*]) (m :: * -> *).
LayersFoldableBuilder__ t layers m =>
SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
[GblId[ClassOp],
Arity=1,
Caf=NoCafRefs,
Str=<S,U>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=False,boring_ok=True)
Tmpl= \ (@ t_ao0O)
(@ (layers_ao0P :: [*]))
(@ (m_ao0Q :: * -> *))
(v_B1 [Occ=Once]
:: LayersFoldableBuilder__ t_ao0O layers_ao0P m_ao0Q)
->
v_B1
`cast`
(Data.Graph.Traversal.Scoped.N:LayersFoldableBuilder__[0]
<t_ao0O>_N <layers_ao0P>_N <m_ao0Q>_N
:: (LayersFoldableBuilder__
t_ao0O layers_ao0P m_ao0Q :: Constraint)
~R# (SomePtr
-> m_ao0Q (Fold.Result t_ao0O)
-> m_ao0Q (Fold.Result t_ao0O) :: *))}]
buildLayersFold__
= \ (@ t_ao0O)
(@ (layers_ao0P :: [*]))
(@ (m_ao0Q :: * -> *))
(v_B1 :: LayersFoldableBuilder__ t_ao0O layers_ao0P m_ao0Q) ->
v_B1
`cast` (Data.Graph.Traversal.Scoped.N:LayersFoldableBuilder__[0]
<t_ao0O>_N <layers_ao0P>_N <m_ao0Q>_N
:: (LayersFoldableBuilder__
t_ao0O layers_ao0P m_ao0Q :: Constraint)
~R# (SomePtr
-> m_ao0Q (Fold.Result t_ao0O)
-> m_ao0Q (Fold.Result t_ao0O) :: *))
--------------------------------------------------------------------------------
FAST, without Monad m =>
--------------------------------------------------------------------------------
-- RHS size: {terms: 8, types: 25, coercions: 0, joins: 0/0}
Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__
:: forall t (layers :: [*]) (m :: * -> *).
LayersFoldableBuilder__ t layers m =>
Monad m
[GblId[ClassOp],
Arity=1,
Caf=NoCafRefs,
Str=<S(SL),U(U,A)>,
RULES: Built in rule for
Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__: "Class op
$p1LayersFoldableBuilder__"]
Data.Graph.Traversal.Scoped.$p1LayersFoldableBuilder__
= \ (@ t_ao0P)
(@ (layers_ao0Q :: [*]))
(@ (m_ao0R :: * -> *))
(v_B1 :: LayersFoldableBuilder__ t_ao0P layers_ao0Q m_ao0R) ->
case v_B1 of v_B1
{ Data.Graph.Traversal.Scoped.C:LayersFoldableBuilder__ v_B2
v_B3 ->
v_B2
}
-- RHS size: {terms: 8, types: 25, coercions: 0, joins: 0/0}
buildLayersFold__
:: forall t (layers :: [*]) (m :: * -> *).
LayersFoldableBuilder__ t layers m =>
SomePtr -> m (Fold.Result t) -> m (Fold.Result t)
[GblId[ClassOp],
Arity=1,
Caf=NoCafRefs,
Str=<S(LS),U(A,U)>,
RULES: Built in rule for buildLayersFold__: "Class op buildLayersFold__"]
buildLayersFold__
= \ (@ t_ao0P)
(@ (layers_ao0Q :: [*]))
(@ (m_ao0R :: * -> *))
(v_B1 :: LayersFoldableBuilder__ t_ao0P layers_ao0Q m_ao0R) ->
case v_B1 of v_B1
{ Data.Graph.Traversal.Scoped.C:LayersFoldableBuilder__ v_B2
v_B3 ->
v_B3
}
}}}
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15176>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list