[Git][ghc/ghc][wip/T18282] Reduce result discount in conSize
Simon Peyton Jones
gitlab at gitlab.haskell.org
Wed Jun 3 08:24:23 UTC 2020
Simon Peyton Jones pushed to branch wip/T18282 at Glasgow Haskell Compiler / GHC
Commits:
43c0ecd2 by Simon Peyton Jones at 2020-06-03T09:23:05+01:00
Reduce result discount in conSize
Ticket #18282 showed that the result discount given by conSize
was massively too large. This patch reduces that discount to
a constant 10, which just balances the cost of the constructor
application itself.
Note [Constructor size and result discount] elaborates, as
does the ticket #18282.
Reducing result discount reduces inlining, which affects perf. I
found that I could increase the unfoldingUseThrehold from 80 to 90 in
compensation; in combination with the result discount change I get
these overall nofib numbers:
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
boyer -0.3% +5.4% +0.7% +1.0% 0.0%
cichelli -0.3% +5.9% -9.9% -9.5% 0.0%
compress2 -0.4% +9.6% +7.2% +6.4% 0.0%
constraints -0.3% +0.2% -3.0% -3.4% 0.0%
cryptarithm2 -0.3% -3.9% -2.2% -2.4% 0.0%
gamteb -0.4% +2.5% +2.8% +2.8% 0.0%
life -0.3% -2.2% -4.7% -4.9% 0.0%
lift -0.3% -0.3% -0.8% -0.5% 0.0%
linear -0.3% -0.1% -4.1% -4.5% 0.0%
mate -0.2% +1.4% -2.2% -1.9% -14.3%
parser -0.3% -2.1% -5.4% -4.6% 0.0%
puzzle -0.3% +2.1% -6.6% -6.3% 0.0%
simple -0.4% +2.8% -3.4% -3.3% -2.2%
veritas -0.1% +0.7% -0.6% -1.1% 0.0%
wheel-sieve2 -0.3% -19.2% -24.9% -24.5% -42.9%
--------------------------------------------------------------------------------
Min -0.4% -19.2% -24.9% -24.5% -42.9%
Max +0.1% +9.6% +7.2% +6.4% +33.3%
Geometric Mean -0.3% -0.0% -3.0% -2.9% -0.3%
I'm ok with these numbers, remembering that this change removes
an *exponential* increase in code size in some in-the-wild cases.
I investigated compress2. The difference is entirely caused by this
function no longer inlining
WriteRoutines.$woutputCodes
= \ (w :: [CodeEvent]) ->
let result_s1Sr
= case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of
(# ww1, ww2 #) -> (ww1, ww2)
in (# case result_s1Sr of (x, _) ->
map @Int @Char WriteRoutines.outputCodes1 x
, case result_s1Sr of { (_, y) -> y } #)
It was right on the cusp before, driven by the excessive result
discount. Too bad!
- - - - -
4 changed files:
- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Driver/Session.hs
- + testsuite/tests/perf/compiler/T18282.hs
- testsuite/tests/perf/compiler/all.T
Changes:
=====================================
compiler/GHC/Core/Unfold.hs
=====================================
@@ -871,16 +871,13 @@ conSize dc n_val_args
| n_val_args == 0 = SizeIs 0 emptyBag 10 -- Like variables
-- See Note [Unboxed tuple size and result discount]
- | isUnboxedTupleCon dc = SizeIs 0 emptyBag (10 * (1 + n_val_args))
+ | isUnboxedTupleCon dc = SizeIs 0 emptyBag 10
-- See Note [Constructor size and result discount]
- | otherwise = SizeIs 10 emptyBag (10 * (1 + n_val_args))
+ | otherwise = SizeIs 10 emptyBag 10
--- XXX still looks to large to me
-
-{-
-Note [Constructor size and result discount]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+{- Note [Constructor size and result discount]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Treat a constructors application as size 10, regardless of how many
arguments it has; we are keen to expose them (and we charge separately
for their args). We can't treat them as size zero, else we find that
@@ -891,14 +888,32 @@ The "result discount" is applied if the result of the call is
scrutinised (say by a case). For a constructor application that will
mean the constructor application will disappear, so we don't need to
charge it to the function. So the discount should at least match the
-cost of the constructor application, namely 10. But to give a bit
-of extra incentive we give a discount of 10*(1 + n_val_args).
-
-Simon M tried a MUCH bigger discount: (10 * (10 + n_val_args)),
-and said it was an "unambiguous win", but its terribly dangerous
-because a function with many many case branches, each finishing with
-a constructor, can have an arbitrarily large discount. This led to
-terrible code bloat: see #6099.
+cost of the constructor application, namely 10.
+
+Historical note 1: Until Jun 2020 we gave it a "bit of extra
+incentive" via a discount of 10*(1 + n_val_args), but that was FAR too
+much (#18282). In particular, consider a huge case tree like
+
+ let r = case y1 of
+ Nothing -> B1 a b c
+ Just v1 -> case y2 of
+ Nothing -> B1 c b a
+ Just v2 -> ...
+
+If conSize gives a cost of 10 (regardless of n_val_args) and a
+discount of 10, that'll make each alternative RHS cost zero. We
+charge 10 for each case alternative (see size_up_alt). If we give a
+bigger discount (say 20) in conSize, we'll make the case expression
+cost *nothing*, and that can make a huge case tree cost nothing. This
+leads to massive, sometimes exponenial inlinings (#18282). In short,
+don't give a discount that give a negative size to a sub-expression!
+
+Historical note 2: Much longer ago, Simon M tried a MUCH bigger
+discount: (10 * (10 + n_val_args)), and said it was an "unambiguous
+win", but its terribly dangerous because a function with many many
+case branches, each finishing with a constructor, can have an
+arbitrarily large discount. This led to terrible code bloat: see
+#6099.
Note [Unboxed tuple size and result discount]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -908,7 +923,7 @@ and f wasn't getting inlined.
I tried giving unboxed tuples a *result discount* of zero (see the
commented-out line). Why? When returned as a result they do not
-allocate, so maybe we don't want to charge so much for them If you
+allocate, so maybe we don't want to charge so much for them. If you
have a non-zero discount here, we find that workers often get inlined
back into wrappers, because it look like
f x = case $wf x of (# a,b #) -> (a,b)
@@ -917,6 +932,9 @@ shrank binary sizes by 0.5% it also made spectral/boyer allocate 5%
more. All other changes were very small. So it's not a big deal but I
didn't adopt the idea.
+When fixing #18282 (see Note [Constructor size and result discount])
+I changed the result discount to be just 10, not 10*(1+n_val_args).
+
Note [Function and non-function discounts]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We want a discount if the function is applied. A good example is
=====================================
compiler/GHC/Driver/Session.hs
=====================================
@@ -1422,16 +1422,21 @@ defaultDynFlags mySettings llvmConfig =
extensions = [],
extensionFlags = flattenExtensionFlags Nothing [],
- -- The ufCreationThreshold threshold must be reasonably high to
- -- take account of possible discounts.
- -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to inline
- -- into Csg.calc (The unfolding for sqr never makes it into the
- -- interface file.)
ufCreationThreshold = 750,
- ufUseThreshold = 80,
- ufFunAppDiscount = 60,
- -- Be fairly keen to inline a function if that means
- -- we'll be able to pick the right method from a dictionary
+ -- The ufCreationThreshold threshold must be reasonably high
+ -- to take account of possible discounts.
+ -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to
+ -- inline into Csg.calc (The unfolding for sqr never makes it
+ -- into the interface file.)
+
+ ufUseThreshold = 90,
+ -- Last adjusted upwards in #18282, when I reduced
+ -- the result discount for constructors.
+
+ ufFunAppDiscount = 60,
+ -- Be fairly keen to inline a function if that means
+ -- we'll be able to pick the right method from a dictionary
+
ufDictDiscount = 30,
ufDearOp = 40,
ufVeryAggressive = False,
=====================================
testsuite/tests/perf/compiler/T18282.hs
=====================================
@@ -0,0 +1,41 @@
+module M
+ ( mkB2
+ ) where
+
+import Control.Monad.Reader
+import Data.Maybe
+
+data A1 = A1 (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+data A2 = A2 A1 (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+ (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+
+data B1 = B1 !String !String !String !String
+data B2 = B2 !B1 !String !String !String !String !String !String !String !String
+-- a b c d e f g h i
+
+type M a = ReaderT [(String, String)] (Either String) a
+
+resolve :: Maybe String -> String -> M (Maybe String)
+resolve (Just x) _ = pure (Just x)
+resolve Nothing v = asks $ lookup v
+
+mkB1 :: A1 -> M B1
+mkB1 (A1 a b c d) = do
+ a' <- fromMaybe "" <$> resolve a "A"
+ b' <- fromMaybe "" <$> resolve b "B"
+ c' <- fromMaybe "" <$> resolve c "C"
+ d' <- fromMaybe "" <$> resolve d "D"
+ pure $ B1 a' b' c' d'
+
+mkB2 :: A2 -> M B2
+mkB2 (A2 a b c d e f g h i) = do
+ a' <- mkB1 a
+ b' <- fromMaybe "db" <$> resolve b "B"
+ c' <- fromMaybe "dc" <$> resolve c "C"
+ d' <- fromMaybe "dd" <$> resolve d "D"
+ e' <- fromMaybe "de" <$> resolve e "E"
+ f' <- fromMaybe "df" <$> resolve f "F"
+ g' <- fromMaybe "dg" <$> resolve g "G"
+ h' <- fromMaybe "dh" <$> resolve h "H"
+ i' <- fromMaybe "di" <$> resolve i "I"
+ pure $ B2 a' b' c' d' e' f' g' h' i'
=====================================
testsuite/tests/perf/compiler/all.T
=====================================
@@ -355,3 +355,9 @@ test('T16190',
['T16190.hs', '-v0'])
test('T16473', normal, makefile_test, ['T16473'])
+
+test ('T18282',
+ [ collect_compiler_stats('bytes allocated',2)
+ ],
+ compile,
+ ['-v0 -O'])
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/43c0ecd2d28a82f57e1430397b6ce0ff167d7278
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/43c0ecd2d28a82f57e1430397b6ce0ff167d7278
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200603/77a3ec19/attachment-0001.html>
More information about the ghc-commits
mailing list