[Git][ghc/ghc][wip/romes/no-simple-joinpoints] Inline join points for rhs without free vars
Rodrigo Mesquita (@alt-romes)
gitlab at gitlab.haskell.org
Thu Feb 13 16:22:02 UTC 2025
Rodrigo Mesquita pushed to branch wip/romes/no-simple-joinpoints at Glasgow Haskell Compiler / GHC
Commits:
f0ab91b2 by Rodrigo Mesquita at 2025-02-13T16:21:48+00:00
Inline join points for rhs without free vars
While investigating #25170, we ran into a program (T16473) that allocated 67%
more because of a join point that failed to inline.
Note [Duplicating join points] explains why we want to be conservative
when inlining join points, using as an example a join point that
captures a free variable `f` that becomes available in the continuation
`blah` for further optimisations, as opposed to being lambda-abstracted.
However, when the RHS of the join point has no free variables and is
trivial, the same argument does not apply, and there's nothing to gain
from preserving it.
On the contrary, not inlining these trivial join points such as
$j f x = K f x |> co
can be actively harmful as they prevent useful optimisations from firing
on the known constructor application. #25723 is such an example.
Therefore, we've extended `uncondInlineJoin` to allow duplicating such closed
trivial join points. See the updated Note [Duplicating join points] for
further details.
Additionally, merge the guards in uncondInlineJoin for point DJ3(b) anad DJ3(c) of
Note [Duplicating join points] to avoid an unnecessary traversal in the
call to `collectArgs`; it's also more uniform.
Co-authored-by: Simon Peyton Jones <simon.peytonjones at gmail.com>
Fixes #25723
- - - - -
7 changed files:
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Unfold.hs
- + testsuite/tests/perf/compiler/T16473b.hs
- + testsuite/tests/perf/compiler/T16473b.stdout
- testsuite/tests/perf/compiler/all.T
- testsuite/tests/simplCore/should_compile/T24229a.stderr
- testsuite/tests/simplCore/should_compile/T24229b.stderr
Changes:
=====================================
compiler/GHC/Core/Opt/Simplify/Iteration.hs
=====================================
@@ -4172,17 +4172,40 @@ application (K f x). So we might inline it to get
K g y -> blah[g,y]
But now we have to make `blah` into a join point, /abstracted/
-over `g` and `y`. In contrast, if we /don't/ inline $j we
-don't need a join point for `blah` and we'll get
- join $j x = let g=f, y=x in blah[g,y]
+over `g` and `y`. We get
+ join $j2 g y = blah
in case v of
- p1 -> $j x1
- p2 -> $j x2
- p3 -> $j x3
+ p1 -> $j2 f x1
+ p2 -> $j2 f x2
+ p3 -> $j2 f x3
+So now we can't see that `g` is always `f` in `blah`.
+
+In contrast, if we /don't/ inline $j we
+don't need a new join point for `blah` and we'll get
+ join $j' x = let g=f, y=x in blah[g,y]
+ in case v of
+ p1 -> $j' x1
+ p2 -> $j' x2
+ p3 -> $j' x3
This can make a /massive/ difference, because `blah` can see
what `f` is, instead of lambda-abstracting over it.
+If instead the RHS of the join point is a simple application that has no free
+variables, as in
+
+ case (join $j x f = K f x )
+ (in case v of )
+ ( p1 -> $j x1 f1 ) of
+ ( p2 -> $j x2 f2 )
+ ( p3 -> $j x3 f3 )
+ K g y -> blah[g,y]
+
+then no information can be gained by preserving the join point (c.f. `f` being
+free in the join point above and being useful to `blah`). In this case, it's
+more beneficial to inline the join point (see (DJ3)(c)) to allow further
+optimisations to fire. An example where failing to do this went wrong is #25723.
+
Beyond this, not-inlining join points reduces duplication. In the above
example, if `blah` was small enough we'd inline it, but that duplicates code,
for no gain. Best just to keep not-inline the join point in the first place.
@@ -4207,12 +4230,27 @@ unconditional-inlining for join points.
case-of-case friendly.
(DJ3) When should `uncondInlineJoin` return True?
- * (exprIsTrivial rhs); this includes uses of unsafeEqualityProof etc; see
+ (a) (exprIsTrivial rhs); this includes uses of unsafeEqualityProof etc; see
the defn of exprIsTrivial. Also nullary constructors.
- * The RHS is a call ($j x y z), where the arguments are all trivial and $j
+ (b) The RHS is a call ($j x y z), where the arguments are all trivial and $j
is a join point: there is no point in creating an indirection.
+ (c) The RHS is a data constructor application (K x y z) where
+
+ - all the args x,y,z are trivial
+ - the free LocalIds of `f x y z` are a subset of the join point binders
+
+ Examples that return True
+ $j x y = K y (x |> co)
+ $j x y = x (y @Int)
+ Examples that return False
+ $j x = K y x -- y is free
+ $j y = f y -- f is free
+
+ Not duplicating these join points has no benefits and blocks other important
+ optimisations from firing (see #25723)
+
(DJ4) By the same token we want to use Plan B in Note [Duplicating StrictArg] when
the RHS of the new join point is a data constructor application. See the
call to isDataConId in the StrictArg case of mkDupableContWithDmds.
=====================================
compiler/GHC/Core/Unfold.hs
=====================================
@@ -434,18 +434,56 @@ uncondInline is_join rhs bndrs arity body size
uncondInlineJoin :: [Var] -> CoreExpr -> Bool
-- See Note [Duplicating join points] point (DJ3) in GHC.Core.Opt.Simplify.Iteration
-uncondInlineJoin _bndrs body
+uncondInlineJoin bndrs body
+
+ -- (DJ3)(a)
| exprIsTrivial body
= True -- Nullary constructors, literals
- | (Var v, args) <- collectArgs body
- , all exprIsTrivial args
- , isJoinId v -- Indirection to another join point; always inline
+ -- (DJ3)(b) and (DJ3)(c) combined
+ | indirectionOrAppWithoutFVs
= True
| otherwise
= False
+ where
+ -- (DJ3)(b):
+ -- - $j1 x = $j2 y x |> co -- YES, inline indirection regardless of free vars
+ -- (DJ3)(c):
+ -- - $j1 x y = K y x |> co -- YES, inline!
+ -- - $j2 x = K f x -- No, don't! (because f is free)
+ indirectionOrAppWithoutFVs = go False body
+
+ go !seen_fv (App f a)
+ | Just has_fv <- go_arg a
+ = go (seen_fv || has_fv) f
+ | otherwise = False -- Not trivial
+ go seen_fv (Var v)
+ | isJoinId v = True -- Indirection to another join point; always inline
+ | isDataConId v = not seen_fv -- e.g. $j a b = K a b
+ | v `elem` bndrs = not seen_fv -- e.g. $j a b = b a
+ go seen_fv (Cast e _) = go seen_fv e
+ go seen_fv (Tick _ e) = go seen_fv e
+ go _ _ = False
+
+ -- go_arg returns:
+ -- - `Nothing` if arg is not trivial
+ -- - `Just True` if arg is trivial but contains free var, literal, or constructor
+ -- - `Just False` if arg is trivial without free vars
+ go_arg (Type {}) = Just False
+ go_arg (Coercion {}) = Just False
+ go_arg (Lit l)
+ | litIsTrivial l = Just True -- e.g. $j x = $j2 x 7 YES, but $j x = K x 7 NO
+ | otherwise = Nothing
+ go_arg (App f a)
+ | isTyCoArg a = go_arg f -- e.g. $j f = K (f @a)
+ | otherwise = Nothing
+ go_arg (Cast e _) = go_arg e
+ go_arg (Tick _ e) = go_arg e
+ go_arg (Var f) = Just $! f `notElem` bndrs
+ go_arg _ = Nothing
+
sizeExpr :: UnfoldingOpts
-> Int -- Bomb out if it gets bigger than this
=====================================
testsuite/tests/perf/compiler/T16473b.hs
=====================================
@@ -0,0 +1,102 @@
+{-# LANGUAGE BangPatterns #-}
+{-# LANGUAGE DataKinds #-}
+{-# LANGUAGE DeriveFunctor #-}
+{-# LANGUAGE GADTs #-}
+{-# LANGUAGE KindSignatures #-}
+{-# LANGUAGE LambdaCase #-}
+{-# LANGUAGE RankNTypes #-}
+{-# LANGUAGE ScopedTypeVariables #-}
+{-# LANGUAGE TypeOperators #-}
+
+{-# OPTIONS_GHC -flate-specialise -O2 #-}
+
+module Main (main) where
+
+import qualified Control.Monad.State.Strict as S
+import Data.Foldable
+import Data.Functor.Identity
+import Data.Kind
+import Data.Monoid
+import Data.Tuple
+
+main :: IO ()
+main = print $ badCore 100
+
+badCore :: Int -> Int
+badCore n = getSum $ fst $ run $ runState mempty $ for_ [0..n] $ \i -> modify (<> Sum i)
+
+data Union (r :: [Type -> Type]) a where
+ Union :: e a -> Union '[e] a
+
+decomp :: Union (e ': r) a -> e a
+decomp (Union a) = a
+{-# INLINE decomp #-}
+
+absurdU :: Union '[] a -> b
+absurdU = absurdU
+
+newtype Semantic r a = Semantic
+ { runSemantic
+ :: forall m
+ . Monad m
+ => (forall x. Union r x -> m x)
+ -> m a
+ }
+
+instance Functor (Semantic f) where
+ fmap f (Semantic m) = Semantic $ \k -> fmap f $ m k
+ {-# INLINE fmap #-}
+
+instance Applicative (Semantic f) where
+ pure a = Semantic (\x -> const (pure a) x)
+ {-# INLINE pure #-}
+ Semantic f <*> Semantic a = Semantic $ \k -> f k <*> a k
+ {-# INLINE (<*>) #-}
+
+instance Monad (Semantic f) where
+ return = pure
+ {-# INLINE return #-}
+ Semantic ma >>= f = Semantic $ \k -> do
+ z <- ma k
+ runSemantic (f z) k
+ {-# INLINE (>>=) #-}
+
+data State s a
+ = Get (s -> a)
+ | Put s a
+ deriving Functor
+
+get :: Semantic '[State s] s
+get = Semantic $ \k -> k $ Union $ Get id
+{-# INLINE get #-}
+
+put :: s -> Semantic '[State s] ()
+put !s = Semantic $ \k -> k $ Union $! Put s ()
+{-# INLINE put #-}
+
+modify :: (s -> s) -> Semantic '[State s] ()
+modify f = do
+ !s <- get
+ put $! f s
+{-# INLINE modify #-}
+
+runState :: s -> Semantic (State s ': r) a -> Semantic r (s, a)
+runState = interpretInStateT $ \case
+ Get k -> fmap k S.get
+ Put s k -> S.put s >> pure k
+{-# INLINE[3] runState #-}
+
+run :: Semantic '[] a -> a
+run (Semantic m) = runIdentity $ m absurdU
+{-# INLINE run #-}
+
+interpretInStateT
+ :: (forall x. e x -> S.StateT s (Semantic r) x)
+ -> s
+ -> Semantic (e ': r) a
+ -> Semantic r (s, a)
+interpretInStateT f s (Semantic m) = Semantic $ \k ->
+ fmap swap $ flip S.runStateT s $ m $ \u ->
+ S.mapStateT (\z -> runSemantic z k) $ f $ decomp u
+{-# INLINE interpretInStateT #-}
+
=====================================
testsuite/tests/perf/compiler/T16473b.stdout
=====================================
@@ -0,0 +1 @@
+5050
=====================================
testsuite/tests/perf/compiler/all.T
=====================================
@@ -612,6 +612,17 @@ test ('T16473',
compile_and_run,
['-O2 -flate-specialise'])
+# Run this program. If inlining the join point fails, it'll start to allocate much more.
+# The join point is only created when -fno-local-float-out
+test ('T16473b',
+ [ collect_stats('bytes allocated',5)
+ , only_ways(['normal'])
+ , js_broken(22261)
+ ],
+ compile_and_run,
+ ['-O2 -fno-local-float-out'])
+
+
test('T17516',
[ collect_compiler_runtime(5),
],
=====================================
testsuite/tests/simplCore/should_compile/T24229a.stderr
=====================================
@@ -1,6 +1,6 @@
==================== Tidy Core ====================
-Result size of Tidy Core = {terms: 83, types: 113, coercions: 8, joins: 1/1}
+Result size of Tidy Core = {terms: 79, types: 106, coercions: 8, joins: 0/0}
Rec {
foo_$s$wfoo
@@ -14,10 +14,9 @@ end Rec }
foo
= \ @a ds ds1 ->
case ds of { I# ww ->
- join { $j ww1 = Just ww1 } in
case ww of ds2 {
- __DEFAULT -> case ds1 `cast` <Co:4> :: ... of { (x, y) -> case foo_$s$wfoo y x (-# ds2 1#) of { (# ww1 #) -> jump $j ww1 } };
- 0# -> jump $j (ds1 `cast` <Co:4> :: ...)
+ __DEFAULT -> case ds1 `cast` <Co:4> :: ... of { (x, y) -> case foo_$s$wfoo y x (-# ds2 1#) of { (# ww1 #) -> Just ww1 } };
+ 0# -> Just (ds1 `cast` <Co:4> :: ...)
}
}
=====================================
testsuite/tests/simplCore/should_compile/T24229b.stderr
=====================================
@@ -1,6 +1,6 @@
==================== Tidy Core ====================
-Result size of Tidy Core = {terms: 64, types: 90, coercions: 8, joins: 1/1}
+Result size of Tidy Core = {terms: 60, types: 83, coercions: 8, joins: 0/0}
Rec {
foo_$s$wfoo
@@ -14,10 +14,9 @@ end Rec }
foo
= \ @a ds ds1 ->
case ds of { I# ww ->
- join { $j ww1 = Just ww1 } in
case ww of ds2 {
- __DEFAULT -> case ds1 `cast` <Co:4> :: ... of { (x, y) -> case foo_$s$wfoo y x (-# ds2 1#) of { (# ww1 #) -> jump $j ww1 } };
- 0# -> jump $j (ds1 `cast` <Co:4> :: ...)
+ __DEFAULT -> case ds1 `cast` <Co:4> :: ... of { (x, y) -> case foo_$s$wfoo y x (-# ds2 1#) of { (# ww1 #) -> Just ww1 } };
+ 0# -> Just (ds1 `cast` <Co:4> :: ...)
}
}
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/f0ab91b25657f8390f8a908b2e4ef0328acd9904
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/f0ab91b25657f8390f8a908b2e4ef0328acd9904
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20250213/6de6bd9e/attachment-0001.html>
More information about the ghc-commits
mailing list