[Git][ghc/ghc][wip/T18282] 2 commits: Reduce result discount in conSize
Ben Gamari
gitlab at gitlab.haskell.org
Fri Jun 5 18:54:38 UTC 2020
Ben Gamari pushed to branch wip/T18282 at Glasgow Haskell Compiler / GHC
Commits:
42bdb3a8 by Simon Peyton Jones at 2020-06-05T14:54:17-04:00
Reduce result discount in conSize
Ticket #18282 showed that the result discount given by conSize
was massively too large. This patch reduces that discount to
a constant 10, which just balances the cost of the constructor
application itself.
Note [Constructor size and result discount] elaborates, as
does the ticket #18282.
Reducing result discount reduces inlining, which affects perf. I
found that I could increase the unfoldingUseThrehold from 80 to 90 in
compensation; in combination with the result discount change I get
these overall nofib numbers:
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
boyer -0.3% +5.4% +0.7% +1.0% 0.0%
cichelli -0.3% +5.9% -9.9% -9.5% 0.0%
compress2 -0.4% +9.6% +7.2% +6.4% 0.0%
constraints -0.3% +0.2% -3.0% -3.4% 0.0%
cryptarithm2 -0.3% -3.9% -2.2% -2.4% 0.0%
gamteb -0.4% +2.5% +2.8% +2.8% 0.0%
life -0.3% -2.2% -4.7% -4.9% 0.0%
lift -0.3% -0.3% -0.8% -0.5% 0.0%
linear -0.3% -0.1% -4.1% -4.5% 0.0%
mate -0.2% +1.4% -2.2% -1.9% -14.3%
parser -0.3% -2.1% -5.4% -4.6% 0.0%
puzzle -0.3% +2.1% -6.6% -6.3% 0.0%
simple -0.4% +2.8% -3.4% -3.3% -2.2%
veritas -0.1% +0.7% -0.6% -1.1% 0.0%
wheel-sieve2 -0.3% -19.2% -24.9% -24.5% -42.9%
--------------------------------------------------------------------------------
Min -0.4% -19.2% -24.9% -24.5% -42.9%
Max +0.1% +9.6% +7.2% +6.4% +33.3%
Geometric Mean -0.3% -0.0% -3.0% -2.9% -0.3%
I'm ok with these numbers, remembering that this change removes
an *exponential* increase in code size in some in-the-wild cases.
I investigated compress2. The difference is entirely caused by this
function no longer inlining
WriteRoutines.$woutputCodes
= \ (w :: [CodeEvent]) ->
let result_s1Sr
= case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of
(# ww1, ww2 #) -> (ww1, ww2)
in (# case result_s1Sr of (x, _) ->
map @Int @Char WriteRoutines.outputCodes1 x
, case result_s1Sr of { (_, y) -> y } #)
It was right on the cusp before, driven by the excessive result
discount. Too bad!
Metric Decrease:
T12227
T12545
T15263
T1969
T5030
T9872a
T9872c
Metric Increase:
T13701
T9872d
- - - - -
7d7c2dc7 by Simon Peyton Jones at 2020-06-05T14:54:30-04:00
Perf wibbles
Document before committing
- - - - -
6 changed files:
- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Core/Unify.hs
- compiler/GHC/Driver/Session.hs
- + testsuite/tests/perf/compiler/T18282.hs
- testsuite/tests/perf/compiler/all.T
Changes:
=====================================
compiler/GHC/Core/Coercion.hs
=====================================
@@ -2834,7 +2834,7 @@ simplifyArgsWorker orig_ki_binders orig_inner_ki orig_fvs
-> [Role] -- Roles at which to flatten these ...
-> [(Type, Coercion)] -- flattened arguments, with their flattening coercions
-> ([Type], [Coercion], CoercionN)
- go acc_xis acc_cos lc binders inner_ki _ []
+ go acc_xis acc_cos !lc binders inner_ki _ []
= (reverse acc_xis, reverse acc_cos, kind_co)
where
final_kind = mkPiTys binders inner_ki
=====================================
compiler/GHC/Core/Unfold.hs
=====================================
@@ -871,16 +871,13 @@ conSize dc n_val_args
| n_val_args == 0 = SizeIs 0 emptyBag 10 -- Like variables
-- See Note [Unboxed tuple size and result discount]
- | isUnboxedTupleCon dc = SizeIs 0 emptyBag (10 * (1 + n_val_args))
+ | isUnboxedTupleCon dc = SizeIs 0 emptyBag 10
-- See Note [Constructor size and result discount]
- | otherwise = SizeIs 10 emptyBag (10 * (1 + n_val_args))
+ | otherwise = SizeIs 10 emptyBag 10
--- XXX still looks to large to me
-
-{-
-Note [Constructor size and result discount]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+{- Note [Constructor size and result discount]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Treat a constructors application as size 10, regardless of how many
arguments it has; we are keen to expose them (and we charge separately
for their args). We can't treat them as size zero, else we find that
@@ -891,14 +888,32 @@ The "result discount" is applied if the result of the call is
scrutinised (say by a case). For a constructor application that will
mean the constructor application will disappear, so we don't need to
charge it to the function. So the discount should at least match the
-cost of the constructor application, namely 10. But to give a bit
-of extra incentive we give a discount of 10*(1 + n_val_args).
-
-Simon M tried a MUCH bigger discount: (10 * (10 + n_val_args)),
-and said it was an "unambiguous win", but its terribly dangerous
-because a function with many many case branches, each finishing with
-a constructor, can have an arbitrarily large discount. This led to
-terrible code bloat: see #6099.
+cost of the constructor application, namely 10.
+
+Historical note 1: Until Jun 2020 we gave it a "bit of extra
+incentive" via a discount of 10*(1 + n_val_args), but that was FAR too
+much (#18282). In particular, consider a huge case tree like
+
+ let r = case y1 of
+ Nothing -> B1 a b c
+ Just v1 -> case y2 of
+ Nothing -> B1 c b a
+ Just v2 -> ...
+
+If conSize gives a cost of 10 (regardless of n_val_args) and a
+discount of 10, that'll make each alternative RHS cost zero. We
+charge 10 for each case alternative (see size_up_alt). If we give a
+bigger discount (say 20) in conSize, we'll make the case expression
+cost *nothing*, and that can make a huge case tree cost nothing. This
+leads to massive, sometimes exponenial inlinings (#18282). In short,
+don't give a discount that give a negative size to a sub-expression!
+
+Historical note 2: Much longer ago, Simon M tried a MUCH bigger
+discount: (10 * (10 + n_val_args)), and said it was an "unambiguous
+win", but its terribly dangerous because a function with many many
+case branches, each finishing with a constructor, can have an
+arbitrarily large discount. This led to terrible code bloat: see
+#6099.
Note [Unboxed tuple size and result discount]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -908,7 +923,7 @@ and f wasn't getting inlined.
I tried giving unboxed tuples a *result discount* of zero (see the
commented-out line). Why? When returned as a result they do not
-allocate, so maybe we don't want to charge so much for them If you
+allocate, so maybe we don't want to charge so much for them. If you
have a non-zero discount here, we find that workers often get inlined
back into wrappers, because it look like
f x = case $wf x of (# a,b #) -> (a,b)
@@ -917,6 +932,9 @@ shrank binary sizes by 0.5% it also made spectral/boyer allocate 5%
more. All other changes were very small. So it's not a big deal but I
didn't adopt the idea.
+When fixing #18282 (see Note [Constructor size and result discount])
+I changed the result discount to be just 10, not 10*(1+n_val_args).
+
Note [Function and non-function discounts]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We want a discount if the function is applied. A good example is
=====================================
compiler/GHC/Core/Unify.hs
=====================================
@@ -1,6 +1,6 @@
-- (c) The University of Glasgow 2006
-{-# LANGUAGE ScopedTypeVariables #-}
+{-# LANGUAGE ScopedTypeVariables, PatternSynonyms, BangPatterns #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE DeriveFunctor #-}
@@ -44,6 +44,7 @@ import GHC.Data.Pair
import GHC.Utils.Outputable
import GHC.Types.Unique.FM
import GHC.Types.Unique.Set
+import GHC.Exts( oneShot )
import Control.Monad
import Control.Applicative hiding ( empty )
@@ -1235,8 +1236,14 @@ data UMState = UMState
{ um_tv_env :: TvSubstEnv
, um_cv_env :: CvSubstEnv }
-newtype UM a = UM { unUM :: UMState -> UnifyResultM (UMState, a) }
- deriving (Functor)
+newtype UM a
+ = UMNoEta { unUM :: UMState -> UnifyResultM (UMState, a) }
+ deriving (Functor)
+
+pattern UM :: (UMState -> UnifyResultM (UMState, a)) -> UM a
+pattern UM m <- UMNoEta m
+ where
+ UM m = UMNoEta (oneShot m)
instance Applicative UM where
pure a = UM (\s -> pure (s, a))
=====================================
compiler/GHC/Driver/Session.hs
=====================================
@@ -1422,16 +1422,21 @@ defaultDynFlags mySettings llvmConfig =
extensions = [],
extensionFlags = flattenExtensionFlags Nothing [],
- -- The ufCreationThreshold threshold must be reasonably high to
- -- take account of possible discounts.
- -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to inline
- -- into Csg.calc (The unfolding for sqr never makes it into the
- -- interface file.)
ufCreationThreshold = 750,
- ufUseThreshold = 80,
- ufFunAppDiscount = 60,
- -- Be fairly keen to inline a function if that means
- -- we'll be able to pick the right method from a dictionary
+ -- The ufCreationThreshold threshold must be reasonably high
+ -- to take account of possible discounts.
+ -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to
+ -- inline into Csg.calc (The unfolding for sqr never makes it
+ -- into the interface file.)
+
+ ufUseThreshold = 90,
+ -- Last adjusted upwards in #18282, when I reduced
+ -- the result discount for constructors.
+
+ ufFunAppDiscount = 60,
+ -- Be fairly keen to inline a function if that means
+ -- we'll be able to pick the right method from a dictionary
+
ufDictDiscount = 30,
ufDearOp = 40,
ufVeryAggressive = False,
=====================================
testsuite/tests/perf/compiler/T18282.hs
=====================================
@@ -0,0 +1,41 @@
+module M
+ ( mkB2
+ ) where
+
+import Control.Monad.Reader
+import Data.Maybe
+
+data A1 = A1 (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+data A2 = A2 A1 (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+ (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+
+data B1 = B1 !String !String !String !String
+data B2 = B2 !B1 !String !String !String !String !String !String !String !String
+-- a b c d e f g h i
+
+type M a = ReaderT [(String, String)] (Either String) a
+
+resolve :: Maybe String -> String -> M (Maybe String)
+resolve (Just x) _ = pure (Just x)
+resolve Nothing v = asks $ lookup v
+
+mkB1 :: A1 -> M B1
+mkB1 (A1 a b c d) = do
+ a' <- fromMaybe "" <$> resolve a "A"
+ b' <- fromMaybe "" <$> resolve b "B"
+ c' <- fromMaybe "" <$> resolve c "C"
+ d' <- fromMaybe "" <$> resolve d "D"
+ pure $ B1 a' b' c' d'
+
+mkB2 :: A2 -> M B2
+mkB2 (A2 a b c d e f g h i) = do
+ a' <- mkB1 a
+ b' <- fromMaybe "db" <$> resolve b "B"
+ c' <- fromMaybe "dc" <$> resolve c "C"
+ d' <- fromMaybe "dd" <$> resolve d "D"
+ e' <- fromMaybe "de" <$> resolve e "E"
+ f' <- fromMaybe "df" <$> resolve f "F"
+ g' <- fromMaybe "dg" <$> resolve g "G"
+ h' <- fromMaybe "dh" <$> resolve h "H"
+ i' <- fromMaybe "di" <$> resolve i "I"
+ pure $ B2 a' b' c' d' e' f' g' h' i'
=====================================
testsuite/tests/perf/compiler/all.T
=====================================
@@ -355,3 +355,9 @@ test('T16190',
['T16190.hs', '-v0'])
test('T16473', normal, makefile_test, ['T16473'])
+
+test ('T18282',
+ [ collect_compiler_stats('bytes allocated',2)
+ ],
+ compile,
+ ['-v0 -O'])
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/d087308095a7697dea08c8478f4cd795a0512804...7d7c2dc7fceee763b4f292466488892cf4b6ca5a
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/d087308095a7697dea08c8478f4cd795a0512804...7d7c2dc7fceee763b4f292466488892cf4b6ca5a
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200605/c2ffda09/attachment-0001.html>
More information about the ghc-commits
mailing list