[Git][ghc/ghc][wip/T17917] 3 commits: Avoid useless w/w split
Simon Peyton Jones
gitlab at gitlab.haskell.org
Fri Mar 20 10:47:10 UTC 2020
Simon Peyton Jones pushed to branch wip/T17917 at Glasgow Haskell Compiler / GHC
Commits:
6eab61f6 by Simon Peyton Jones at 2020-03-19T14:28:42Z
Avoid useless w/w split
This patch is just a tidy-up for the post-strictness-analysis
worker wrapper split. Consider
f x = x
Strictnesss analysis does not lead to a w/w split, so the
obvious thing is to leave it 100% alone. But actually, because
the RHS is small, we ended up adding a StableUnfolding for it.
There is some reason to do this if we choose /not/ do to w/w
on the grounds that the function is small. See
Note [Don't w/w inline small non-loop-breaker things]
But there is no reason if we would not have done w/w anyway.
This patch just moves the conditional to later. Easy.
This does move soem -ddump-simpl printouts around a bit.
I also discovered that the previous code was overwritten an
InlineCompulsory with InlineStable, which is utterly wrong. That in
turn meant that some default methods (marked InlineCompulsory)
were getting their InlineCompulsory squashed. This patch fixes
that bug --- but of course that does mean a bit more inlining!
- - - - -
cf1c1167 by Simon Peyton Jones at 2020-03-20T10:43:19Z
Wibble to the worker/wrapper patch
Don't attempt to w/w record selectors. Rather ad-hoc but very simple
and effective.
See WorkWrap Note [No worker-wrapper for record selectors].
- - - - -
761e9570 by Simon Peyton Jones at 2020-03-20T10:45:06Z
DO NOT MERGE THIS PATCH
This is a temporary fix on the T17917 branch, to account for
This branch should be rebased once the patch for #17932 lands
(namely !2929), and this particular commit discarded
- - - - -
10 changed files:
- compiler/GHC/Core/Op/DmdAnal.hs
- compiler/GHC/Core/Op/WorkWrap.hs
- compiler/GHC/Core/Unfold.hs
- testsuite/tests/dependent/should_compile/dynamic-paper.stderr
- testsuite/tests/driver/inline-check.stderr
- testsuite/tests/numeric/should_compile/T14465.stdout
- testsuite/tests/perf/compiler/T16473.stdout
- testsuite/tests/simplCore/should_compile/T4201.stdout
- testsuite/tests/simplCore/should_compile/T5658b.stdout
- testsuite/tests/warnings/should_compile/T16282/T16282.stderr
Changes:
=====================================
compiler/GHC/Core/Op/DmdAnal.hs
=====================================
@@ -623,10 +623,11 @@ dmdAnalRhsLetDown rec_flag env let_dmd id rhs
-- For example, a call of the form @mkRhsDmd _ 2 (\x y -> (x, y))@ returns a
-- clean usage demand of @C1(C1(U(U,U)))@.
mkRhsDmd :: AnalEnv -> Arity -> CoreExpr -> CleanDemand
-mkRhsDmd env rhs_arity rhs =
- case peelTsFuns rhs_arity (findTypeShape (ae_fam_envs env) (exprType rhs)) of
- Just (TsProd tss) -> mkCallDmds rhs_arity (cleanEvalProdDmd (length tss))
- _ -> mkCallDmds rhs_arity cleanEvalDmd
+mkRhsDmd _env rhs_arity _rhs =
+ mkCallDmds rhs_arity cleanEvalDmd
+-- case peelTsFuns rhs_arity (findTypeShape (ae_fam_envs env) (exprType rhs)) of
+-- Just (TsProd tss) -> mkCallDmds rhs_arity (cleanEvalProdDmd (length tss))
+-- _ -> mkCallDmds rhs_arity cleanEvalDmd
-- | If given the let-bound 'Id', 'useLetUp' determines whether we should
-- process the binding up (body before rhs) or down (rhs before body).
=====================================
compiler/GHC/Core/Op/WorkWrap.hs
=====================================
@@ -24,6 +24,7 @@ import GHC.Driver.Session
import Demand
import Cpr
import GHC.Core.Op.WorkWrap.Lib
+import Unique
import Util
import Outputable
import GHC.Core.FamInstEnv
@@ -203,6 +204,23 @@ unfolding to the *worker*. So we will get something like this:
How do we "transfer the unfolding"? Easy: by using the old one, wrapped
in work_fn! See GHC.Core.Unfold.mkWorkerUnfolding.
+Note [No worker-wrapper for record selectors]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We sometimes generate a lot of record selectors, and generally the
+don't benefit from worker/wrapper. Yes, mkWwBodies would find a w/w split,
+but it is then suppressed by the certainlyWillInline test in splitFun.
+
+The wasted effort in mkWwBodies makes a measurable difference in
+compile time (see MR !2873), so although it's a terribly ad-hoc test,
+we just check here for record selectors, and do a no-op in that case.
+
+I did look for a generalisation, so that it's not just record
+selectors that benefit. But you'd need a cheap test for "this
+function will definitely get a w/w split" and that's hard to predict
+in advance...the logic in mkWwBodies is complex. So I've left the
+super-simple test, with this Note to explain.
+
+
Note [Worker-wrapper for NOINLINE functions]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We used to disable worker/wrapper for NOINLINE things, but it turns out
@@ -316,8 +334,8 @@ Note [Don't w/w inline small non-loop-breaker things]
In general, we refrain from w/w-ing *small* functions, which are not
loop breakers, because they'll inline anyway. But we must take care:
it may look small now, but get to be big later after other inlining
-has happened. So we take the precaution of adding an INLINE pragma to
-any such functions.
+has happened. So we take the precaution of adding a StableUnfolding
+for any such functions.
I made this change when I observed a big function at the end of
compilation with a useful strictness signature but no w-w. (It was
@@ -457,11 +475,6 @@ tryWW :: DynFlags
tryWW dflags fam_envs is_rec fn_id rhs
-- See Note [Worker-wrapper for NOINLINE functions]
- | Just stable_unf <- certainlyWillInline dflags fn_info
- = return [ (fn_id `setIdUnfolding` stable_unf, rhs) ]
- -- See Note [Don't w/w INLINE things]
- -- See Note [Don't w/w inline small non-loop-breaker things]
-
| is_fun && is_eta_exp
= splitFun dflags fam_envs new_fn_id fn_info wrap_dmds div cpr rhs
@@ -567,105 +580,125 @@ See https://gitlab.haskell.org/ghc/ghc/merge_requests/312#note_192064.
splitFun :: DynFlags -> FamInstEnvs -> Id -> IdInfo -> [Demand] -> Divergence -> CprResult -> CoreExpr
-> UniqSM [(Id, CoreExpr)]
splitFun dflags fam_envs fn_id fn_info wrap_dmds div cpr rhs
- = WARN( not (wrap_dmds `lengthIs` arity), ppr fn_id <+> (ppr arity $$ ppr wrap_dmds $$ ppr cpr) ) do
- -- The arity should match the signature
- stuff <- mkWwBodies dflags fam_envs rhs_fvs fn_id wrap_dmds use_cpr_info
- case stuff of
- Just (work_demands, join_arity, wrap_fn, work_fn) -> do
- work_uniq <- getUniqueM
- let work_rhs = work_fn rhs
- work_act = case fn_inline_spec of -- See Note [Worker activation]
- NoInline -> fn_act
- _ -> wrap_act
-
- work_prag = InlinePragma { inl_src = SourceText "{-# INLINE"
- , inl_inline = fn_inline_spec
- , inl_sat = Nothing
- , inl_act = work_act
- , inl_rule = FunLike }
- -- inl_inline: copy from fn_id; see Note [Worker-wrapper for INLINABLE functions]
- -- inl_act: see Note [Worker activation]
- -- inl_rule: it does not make sense for workers to be constructorlike.
-
- work_join_arity | isJoinId fn_id = Just join_arity
- | otherwise = Nothing
- -- worker is join point iff wrapper is join point
- -- (see Note [Don't w/w join points for CPR])
-
- work_id = mkWorkerId work_uniq fn_id (exprType work_rhs)
- `setIdOccInfo` occInfo fn_info
- -- Copy over occurrence info from parent
- -- Notably whether it's a loop breaker
- -- Doesn't matter much, since we will simplify next, but
- -- seems right-er to do so
-
- `setInlinePragma` work_prag
-
- `setIdUnfolding` mkWorkerUnfolding dflags work_fn fn_unfolding
- -- See Note [Worker-wrapper for INLINABLE functions]
-
- `setIdStrictness` mkClosedStrictSig work_demands div
- -- Even though we may not be at top level,
- -- it's ok to give it an empty DmdEnv
-
- `setIdCprInfo` mkCprSig work_arity work_cpr_info
-
- `setIdDemandInfo` worker_demand
-
- `setIdArity` work_arity
- -- Set the arity so that the Core Lint check that the
- -- arity is consistent with the demand type goes
- -- through
- `asJoinId_maybe` work_join_arity
-
- work_arity = length work_demands
-
- -- See Note [Demand on the Worker]
- single_call = saturatedByOneShots arity (demandInfo fn_info)
- worker_demand | single_call = mkWorkerDemand work_arity
- | otherwise = topDmd
-
- wrap_rhs = wrap_fn work_id
- wrap_act = case fn_act of -- See Note [Wrapper activation]
- ActiveAfter {} -> fn_act
- NeverActive -> activeDuringFinal
- _ -> activeAfterInitial
- wrap_prag = InlinePragma { inl_src = SourceText "{-# INLINE"
- , inl_inline = NoUserInline
- , inl_sat = Nothing
- , inl_act = wrap_act
- , inl_rule = rule_match_info }
- -- inl_act: see Note [Wrapper activation]
- -- inl_inline: see Note [Wrapper NoUserInline]
- -- inl_rule: RuleMatchInfo is (and must be) unaffected
-
- wrap_id = fn_id `setIdUnfolding` mkWwInlineRule dflags wrap_rhs arity
- `setInlinePragma` wrap_prag
- `setIdOccInfo` noOccInfo
- -- Zap any loop-breaker-ness, to avoid bleating from Lint
- -- about a loop breaker with an INLINE rule
-
-
-
- return $ [(work_id, work_rhs), (wrap_id, wrap_rhs)]
- -- Worker first, because wrapper mentions it
-
- Nothing -> return [(fn_id, rhs)]
+ | isRecordSelector fn_id -- See Note [No worker/wrapper for record selectors]
+ = return [ (fn_id, rhs ) ]
+
+ | otherwise
+ = WARN( not (wrap_dmds `lengthIs` arity), ppr fn_id <+> (ppr arity $$ ppr wrap_dmds $$ ppr cpr) )
+ -- The arity should match the signature
+ do { mb_stuff <- mkWwBodies dflags fam_envs rhs_fvs fn_id wrap_dmds use_cpr_info
+ ; case mb_stuff of
+ Nothing -> return [(fn_id, rhs)]
+
+ Just stuff
+ | Just stable_unf <- certainlyWillInline dflags fn_info
+ -> return [ (fn_id `setIdUnfolding` stable_unf, rhs) ]
+ -- See Note [Don't w/w INLINE things]
+ -- See Note [Don't w/w inline small non-loop-breaker things]
+
+ | otherwise
+ -> do { work_uniq <- getUniqueM
+ ; return (mkWWBindPair dflags fn_id fn_info arity rhs
+ work_uniq div cpr stuff) } }
where
- rhs_fvs = exprFreeVars rhs
+ rhs_fvs = exprFreeVars rhs
+ arity = arityInfo fn_info
+ -- The arity is set by the simplifier using exprEtaExpandArity
+ -- So it may be more than the number of top-level-visible lambdas
+
+ -- use_cpr_info is the CPR we w/w for. Note that we kill it for join points,
+ -- see Note [Don't w/w join points for CPR].
+ use_cpr_info | isJoinId fn_id = topCpr
+ | otherwise = cpr
+
+
+mkWWBindPair :: DynFlags -> Id -> IdInfo -> Arity
+ -> CoreExpr -> Unique -> Divergence -> CprResult
+ -> ([Demand], JoinArity, Id -> CoreExpr, Expr CoreBndr -> CoreExpr)
+ -> [(Id, CoreExpr)]
+mkWWBindPair dflags fn_id fn_info arity rhs work_uniq div cpr
+ (work_demands, join_arity, wrap_fn, work_fn)
+ = [(work_id, work_rhs), (wrap_id, wrap_rhs)]
+ -- Worker first, because wrapper mentions it
+ where
+ work_rhs = work_fn rhs
+ work_act = case fn_inline_spec of -- See Note [Worker activation]
+ NoInline -> fn_act
+ _ -> wrap_act
+
+ work_prag = InlinePragma { inl_src = SourceText "{-# INLINE"
+ , inl_inline = fn_inline_spec
+ , inl_sat = Nothing
+ , inl_act = work_act
+ , inl_rule = FunLike }
+ -- inl_inline: copy from fn_id; see Note [Worker-wrapper for INLINABLE functions]
+ -- inl_act: see Note [Worker activation]
+ -- inl_rule: it does not make sense for workers to be constructorlike.
+
+ work_join_arity | isJoinId fn_id = Just join_arity
+ | otherwise = Nothing
+ -- worker is join point iff wrapper is join point
+ -- (see Note [Don't w/w join points for CPR])
+
+ work_id = mkWorkerId work_uniq fn_id (exprType work_rhs)
+ `setIdOccInfo` occInfo fn_info
+ -- Copy over occurrence info from parent
+ -- Notably whether it's a loop breaker
+ -- Doesn't matter much, since we will simplify next, but
+ -- seems right-er to do so
+
+ `setInlinePragma` work_prag
+
+ `setIdUnfolding` mkWorkerUnfolding dflags work_fn fn_unfolding
+ -- See Note [Worker-wrapper for INLINABLE functions]
+
+ `setIdStrictness` mkClosedStrictSig work_demands div
+ -- Even though we may not be at top level,
+ -- it's ok to give it an empty DmdEnv
+
+ `setIdCprInfo` mkCprSig work_arity work_cpr_info
+
+ `setIdDemandInfo` worker_demand
+
+ `setIdArity` work_arity
+ -- Set the arity so that the Core Lint check that the
+ -- arity is consistent with the demand type goes
+ -- through
+ `asJoinId_maybe` work_join_arity
+
+ work_arity = length work_demands
+
+ -- See Note [Demand on the Worker]
+ single_call = saturatedByOneShots arity (demandInfo fn_info)
+ worker_demand | single_call = mkWorkerDemand work_arity
+ | otherwise = topDmd
+
+ wrap_rhs = wrap_fn work_id
+ wrap_act = case fn_act of -- See Note [Wrapper activation]
+ ActiveAfter {} -> fn_act
+ NeverActive -> activeDuringFinal
+ _ -> activeAfterInitial
+ wrap_prag = InlinePragma { inl_src = SourceText "{-# INLINE"
+ , inl_inline = NoUserInline
+ , inl_sat = Nothing
+ , inl_act = wrap_act
+ , inl_rule = rule_match_info }
+ -- inl_act: see Note [Wrapper activation]
+ -- inl_inline: see Note [Wrapper NoUserInline]
+ -- inl_rule: RuleMatchInfo is (and must be) unaffected
+
+ wrap_id = fn_id `setIdUnfolding` mkWwInlineRule dflags wrap_rhs arity
+ `setInlinePragma` wrap_prag
+ `setIdOccInfo` noOccInfo
+ -- Zap any loop-breaker-ness, to avoid bleating from Lint
+ -- about a loop breaker with an INLINE rule
+
fn_inl_prag = inlinePragInfo fn_info
fn_inline_spec = inl_inline fn_inl_prag
fn_act = inl_act fn_inl_prag
rule_match_info = inlinePragmaRuleMatchInfo fn_inl_prag
fn_unfolding = unfoldingInfo fn_info
- arity = arityInfo fn_info
- -- The arity is set by the simplifier using exprEtaExpandArity
- -- So it may be more than the number of top-level-visible lambdas
- -- use_cpr_info is the CPR we w/w for. Note that we kill it for join points,
- -- see Note [Don't w/w join points for CPR].
- use_cpr_info | isJoinId fn_id = topCpr
- | otherwise = cpr
-- Even if we don't w/w join points for CPR, we might still do so for
-- strictness. In which case a join point worker keeps its original CPR
-- property; see Note [Don't w/w join points for CPR]. Otherwise, the worker
=====================================
compiler/GHC/Core/Unfold.hs
=====================================
@@ -1147,11 +1147,21 @@ certainlyWillInline :: DynFlags -> IdInfo -> Maybe Unfolding
-- ^ Sees if the unfolding is pretty certain to inline.
-- If so, return a *stable* unfolding for it, that will always inline.
certainlyWillInline dflags fn_info
- = case unfoldingInfo fn_info of
- CoreUnfolding { uf_tmpl = e, uf_guidance = g }
- | loop_breaker -> Nothing -- Won't inline, so try w/w
- | noinline -> Nothing -- See Note [Worker-wrapper for NOINLINE functions]
- | otherwise -> do_cunf e g -- Depends on size, so look at that
+ = case fn_unf of
+ CoreUnfolding { uf_tmpl = expr, uf_guidance = guidance, uf_src = src }
+ | loop_breaker -> Nothing -- Won't inline, so try w/w
+ | noinline -> Nothing -- See Note [Worker-wrapper for NOINLINE functions]
+ | otherwise
+ -> case guidance of
+ UnfNever -> Nothing
+ UnfWhen {} -> Just (fn_unf { uf_src = src' })
+ -- INLINE functions have UnfWhen
+ UnfIfGoodArgs { ug_size = size, ug_args = args }
+ -> do_cunf expr size args src'
+ where
+ src' = case src of
+ InlineRhs -> InlineStable
+ _ -> src -- Do not change InlineCompulsory!
DFunUnfolding {} -> Just fn_unf -- Don't w/w DFuns; it never makes sense
-- to do so, and even if it is currently a
@@ -1164,17 +1174,12 @@ certainlyWillInline dflags fn_info
noinline = inlinePragmaSpec (inlinePragInfo fn_info) == NoInline
fn_unf = unfoldingInfo fn_info
- do_cunf :: CoreExpr -> UnfoldingGuidance -> Maybe Unfolding
- do_cunf _ UnfNever = Nothing
- do_cunf _ (UnfWhen {}) = Just (fn_unf { uf_src = InlineStable })
- -- INLINE functions have UnfWhen
-
-- The UnfIfGoodArgs case seems important. If we w/w small functions
-- binary sizes go up by 10%! (This is with SplitObjs.)
-- I'm not totally sure why.
-- INLINABLE functions come via this path
-- See Note [certainlyWillInline: INLINABLE]
- do_cunf expr (UnfIfGoodArgs { ug_size = size, ug_args = args })
+ do_cunf expr size args src'
| arityInfo fn_info > 0 -- See Note [certainlyWillInline: be careful of thunks]
, not (isBottomingSig (strictnessInfo fn_info))
-- Do not unconditionally inline a bottoming functions even if
@@ -1182,7 +1187,7 @@ certainlyWillInline dflags fn_info
-- so we don't want to re-inline it.
, let unf_arity = length args
, size - (10 * (unf_arity + 1)) <= ufUseThreshold dflags
- = Just (fn_unf { uf_src = InlineStable
+ = Just (fn_unf { uf_src = src'
, uf_guidance = UnfWhen { ug_arity = unf_arity
, ug_unsat_ok = unSaturatedOk
, ug_boring_ok = inlineBoringOk expr } })
=====================================
testsuite/tests/dependent/should_compile/dynamic-paper.stderr
=====================================
@@ -12,4 +12,4 @@ Simplifier ticks exhausted
simplifier non-termination has been judged acceptable.
To see detailed counts use -ddump-simpl-stats
- Total ticks: 140086
+ Total ticks: 159723
=====================================
testsuite/tests/driver/inline-check.stderr
=====================================
@@ -21,6 +21,7 @@ Considering inlining: foo
some_benefit False
is exp: True
is work-free: True
- guidance ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
+ guidance IF_ARGS [0] 30 0
+ discounted size = 20
ANSWER = NO
Inactive unfolding: foo1
=====================================
testsuite/tests/numeric/should_compile/T14465.stdout
=====================================
@@ -82,10 +82,8 @@ plusOne :: Natural -> Natural
[GblId,
Arity=1,
Str=<S,U>,
- Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
- WorkFree=True, Expandable=True,
- Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
- Tmpl= \ (n [Occ=Once] :: Natural) -> plusNatural n M.minusOne1}]
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+ WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 30 0}]
plusOne = \ (n :: Natural) -> plusNatural n M.minusOne1
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
=====================================
testsuite/tests/perf/compiler/T16473.stdout
=====================================
@@ -2,10 +2,10 @@ Rule fired: Class op fmap (BUILTIN)
Rule fired: Class op liftA2 (BUILTIN)
Rule fired: Class op $p1Applicative (BUILTIN)
Rule fired: Class op <*> (BUILTIN)
-Rule fired: Class op <$ (BUILTIN)
+Rule fired: Class op fmap (BUILTIN)
Rule fired: Class op $p1Applicative (BUILTIN)
Rule fired: Class op <*> (BUILTIN)
-Rule fired: Class op fmap (BUILTIN)
+Rule fired: Class op <$ (BUILTIN)
Rule fired: Class op pure (BUILTIN)
Rule fired: Class op pure (BUILTIN)
Rule fired: Class op >>= (BUILTIN)
@@ -104,7 +104,6 @@ Rule fired: Class op fmap (BUILTIN)
Rule fired: Class op fmap (BUILTIN)
Rule fired: Class op >>= (BUILTIN)
Rule fired: Class op >>= (BUILTIN)
-Rule fired: Class op fmap (BUILTIN)
Rule fired: SPEC/Main $fFunctorStateT @Identity _ (Main)
Rule fired: SPEC/Main $fApplicativeStateT_$cpure @Identity _ (Main)
Rule fired: SPEC/Main $fApplicativeStateT_$c<*> @Identity _ (Main)
=====================================
testsuite/tests/simplCore/should_compile/T4201.stdout
=====================================
@@ -1,3 +1,3 @@
+ lift :: Foo -> T
[HasNoCafRefs, Arity: 1, Strictness: <S,1*H>, CPR: m1,
- Unfolding: InlineRule (0, True, True)
- bof `cast` (Sym (N:Foo[0]) ->_R <T>_R)]
+ Unfolding: (bof `cast` (Sym (N:Foo[0]) ->_R <T>_R))]
=====================================
testsuite/tests/simplCore/should_compile/T5658b.stdout
=====================================
@@ -1 +1 @@
-4
+2
=====================================
testsuite/tests/warnings/should_compile/T16282/T16282.stderr
=====================================
@@ -5,6 +5,5 @@ T16282.hs: warning: [-Wall-missed-specialisations]
Probable fix: add INLINABLE pragma on ‘Data.Foldable.$wmapM_’
T16282.hs: warning: [-Wall-missed-specialisations]
- Could not specialise imported function ‘Data.Map.Internal.$w$cshowsPrec’
- when specialising ‘Data.Map.Internal.$fShowMap_$cshowsPrec’
- Probable fix: add INLINABLE pragma on ‘Data.Map.Internal.$w$cshowsPrec’
+ Could not specialise imported function ‘Data.Map.Internal.$fShowMap_$cshow’
+ Probable fix: add INLINABLE pragma on ‘Data.Map.Internal.$fShowMap_$cshow’
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/compare/88d5964b0c1261c6dfbcd2bfd42c3dfc7351e333...761e957095ebb15cb67ebffca0390c16643511c9
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/compare/88d5964b0c1261c6dfbcd2bfd42c3dfc7351e333...761e957095ebb15cb67ebffca0390c16643511c9
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200320/330a90ea/attachment-0001.html>
More information about the ghc-commits
mailing list