[Git][ghc/ghc][wip/T18282] Reduce result discount in conSize

Simon Peyton Jones gitlab at gitlab.haskell.org
Wed Jun 3 08:24:23 UTC 2020



Simon Peyton Jones pushed to branch wip/T18282 at Glasgow Haskell Compiler / GHC


Commits:
43c0ecd2 by Simon Peyton Jones at 2020-06-03T09:23:05+01:00
Reduce result discount in conSize

Ticket #18282 showed that the result discount given by conSize
was massively too large.  This patch reduces that discount to
a constant 10, which just balances the cost of the constructor
application itself.

Note [Constructor size and result discount] elaborates, as
does the ticket #18282.

Reducing result discount reduces inlining, which affects perf.  I
found that I could increase the unfoldingUseThrehold from 80 to 90 in
compensation; in combination with the result discount change I get
these overall nofib numbers:

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
          boyer          -0.3%     +5.4%     +0.7%     +1.0%      0.0%
       cichelli          -0.3%     +5.9%     -9.9%     -9.5%      0.0%
      compress2          -0.4%     +9.6%     +7.2%     +6.4%      0.0%
    constraints          -0.3%     +0.2%     -3.0%     -3.4%      0.0%
   cryptarithm2          -0.3%     -3.9%     -2.2%     -2.4%      0.0%
         gamteb          -0.4%     +2.5%     +2.8%     +2.8%      0.0%
           life          -0.3%     -2.2%     -4.7%     -4.9%      0.0%
           lift          -0.3%     -0.3%     -0.8%     -0.5%      0.0%
         linear          -0.3%     -0.1%     -4.1%     -4.5%      0.0%
           mate          -0.2%     +1.4%     -2.2%     -1.9%    -14.3%
         parser          -0.3%     -2.1%     -5.4%     -4.6%      0.0%
         puzzle          -0.3%     +2.1%     -6.6%     -6.3%      0.0%
         simple          -0.4%     +2.8%     -3.4%     -3.3%     -2.2%
        veritas          -0.1%     +0.7%     -0.6%     -1.1%      0.0%
   wheel-sieve2          -0.3%    -19.2%    -24.9%    -24.5%    -42.9%
--------------------------------------------------------------------------------
            Min          -0.4%    -19.2%    -24.9%    -24.5%    -42.9%
            Max          +0.1%     +9.6%     +7.2%     +6.4%    +33.3%
 Geometric Mean          -0.3%     -0.0%     -3.0%     -2.9%     -0.3%

I'm ok with these numbers, remembering that this change removes
an *exponential* increase in code size in some in-the-wild cases.

I investigated compress2.  The difference is entirely caused by this
function no longer inlining

WriteRoutines.$woutputCodes
  = \ (w :: [CodeEvent]) ->
      let result_s1Sr
            = case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of
                (# ww1, ww2 #) -> (ww1, ww2)
      in (# case result_s1Sr of (x, _) ->
              map @Int @Char WriteRoutines.outputCodes1 x
         , case result_s1Sr of { (_, y) -> y } #)

It was right on the cusp before, driven by the excessive result
discount.  Too bad!

- - - - -


4 changed files:

- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Driver/Session.hs
- + testsuite/tests/perf/compiler/T18282.hs
- testsuite/tests/perf/compiler/all.T


Changes:

=====================================
compiler/GHC/Core/Unfold.hs
=====================================
@@ -871,16 +871,13 @@ conSize dc n_val_args
   | n_val_args == 0 = SizeIs 0 emptyBag 10    -- Like variables
 
 -- See Note [Unboxed tuple size and result discount]
-  | isUnboxedTupleCon dc = SizeIs 0 emptyBag (10 * (1 + n_val_args))
+  | isUnboxedTupleCon dc = SizeIs 0 emptyBag 10
 
 -- See Note [Constructor size and result discount]
-  | otherwise = SizeIs 10 emptyBag (10 * (1 + n_val_args))
+  | otherwise = SizeIs 10 emptyBag 10
 
--- XXX still looks to large to me
-
-{-
-Note [Constructor size and result discount]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+{- Note [Constructor size and result discount]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Treat a constructors application as size 10, regardless of how many
 arguments it has; we are keen to expose them (and we charge separately
 for their args).  We can't treat them as size zero, else we find that
@@ -891,14 +888,32 @@ The "result discount" is applied if the result of the call is
 scrutinised (say by a case).  For a constructor application that will
 mean the constructor application will disappear, so we don't need to
 charge it to the function.  So the discount should at least match the
-cost of the constructor application, namely 10.  But to give a bit
-of extra incentive we give a discount of 10*(1 + n_val_args).
-
-Simon M tried a MUCH bigger discount: (10 * (10 + n_val_args)),
-and said it was an "unambiguous win", but its terribly dangerous
-because a function with many many case branches, each finishing with
-a constructor, can have an arbitrarily large discount.  This led to
-terrible code bloat: see #6099.
+cost of the constructor application, namely 10.
+
+Historical note 1: Until Jun 2020 we gave it a "bit of extra
+incentive" via a discount of 10*(1 + n_val_args), but that was FAR too
+much (#18282).  In particular, consider a huge case tree like
+
+   let r = case y1 of
+          Nothing -> B1 a b c
+          Just v1 -> case y2 of
+                      Nothing -> B1 c b a
+                      Just v2 -> ...
+
+If conSize gives a cost of 10 (regardless of n_val_args) and a
+discount of 10, that'll make each alternative RHS cost zero.  We
+charge 10 for each case alternative (see size_up_alt).  If we give a
+bigger discount (say 20) in conSize, we'll make the case expression
+cost *nothing*, and that can make a huge case tree cost nothing. This
+leads to massive, sometimes exponenial inlinings (#18282).  In short,
+don't give a discount that give a negative size to a sub-expression!
+
+Historical note 2: Much longer ago, Simon M tried a MUCH bigger
+discount: (10 * (10 + n_val_args)), and said it was an "unambiguous
+win", but its terribly dangerous because a function with many many
+case branches, each finishing with a constructor, can have an
+arbitrarily large discount.  This led to terrible code bloat: see
+#6099.
 
 Note [Unboxed tuple size and result discount]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -908,7 +923,7 @@ and f wasn't getting inlined.
 
 I tried giving unboxed tuples a *result discount* of zero (see the
 commented-out line).  Why?  When returned as a result they do not
-allocate, so maybe we don't want to charge so much for them If you
+allocate, so maybe we don't want to charge so much for them. If you
 have a non-zero discount here, we find that workers often get inlined
 back into wrappers, because it look like
     f x = case $wf x of (# a,b #) -> (a,b)
@@ -917,6 +932,9 @@ shrank binary sizes by 0.5% it also made spectral/boyer allocate 5%
 more. All other changes were very small. So it's not a big deal but I
 didn't adopt the idea.
 
+When fixing #18282 (see Note [Constructor size and result discount])
+I changed the result discount to be just 10, not 10*(1+n_val_args).
+
 Note [Function and non-function discounts]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We want a discount if the function is applied. A good example is


=====================================
compiler/GHC/Driver/Session.hs
=====================================
@@ -1422,16 +1422,21 @@ defaultDynFlags mySettings llvmConfig =
         extensions = [],
         extensionFlags = flattenExtensionFlags Nothing [],
 
-        -- The ufCreationThreshold threshold must be reasonably high to
-        -- take account of possible discounts.
-        -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to inline
-        -- into Csg.calc (The unfolding for sqr never makes it into the
-        -- interface file.)
         ufCreationThreshold = 750,
-        ufUseThreshold      = 80,
-        ufFunAppDiscount    = 60,
-        -- Be fairly keen to inline a function if that means
-        -- we'll be able to pick the right method from a dictionary
+           -- The ufCreationThreshold threshold must be reasonably high
+           -- to take account of possible discounts.
+           -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to
+           -- inline into Csg.calc (The unfolding for sqr never makes it
+           -- into the interface file.)
+
+        ufUseThreshold = 90,
+           -- Last adjusted upwards in #18282, when I reduced
+           -- the result discount for constructors.
+
+        ufFunAppDiscount = 60,
+           -- Be fairly keen to inline a function if that means
+           -- we'll be able to pick the right method from a dictionary
+
         ufDictDiscount      = 30,
         ufDearOp            = 40,
         ufVeryAggressive    = False,


=====================================
testsuite/tests/perf/compiler/T18282.hs
=====================================
@@ -0,0 +1,41 @@
+module M
+  ( mkB2
+  ) where
+
+import Control.Monad.Reader
+import Data.Maybe
+
+data A1 = A1 (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+data A2 = A2 A1 (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+                (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+
+data B1 = B1 !String !String !String !String
+data B2 = B2 !B1 !String !String !String !String !String !String !String !String
+--           a   b       c       d       e       f       g       h       i
+
+type M a = ReaderT [(String, String)] (Either String) a
+
+resolve :: Maybe String -> String -> M (Maybe String)
+resolve (Just x) _ = pure (Just x)
+resolve Nothing  v = asks $ lookup v
+
+mkB1 :: A1 -> M B1
+mkB1 (A1 a b c d) = do
+  a' <- fromMaybe "" <$> resolve a "A"
+  b' <- fromMaybe "" <$> resolve b "B"
+  c' <- fromMaybe "" <$> resolve c "C"
+  d' <- fromMaybe "" <$> resolve d "D"
+  pure $ B1 a' b' c' d'
+
+mkB2 :: A2 -> M B2
+mkB2 (A2 a b c d e f g h i) = do
+  a' <- mkB1 a
+  b' <- fromMaybe "db" <$> resolve b "B"
+  c' <- fromMaybe "dc" <$> resolve c "C"
+  d' <- fromMaybe "dd" <$> resolve d "D"
+  e' <- fromMaybe "de" <$> resolve e "E"
+  f' <- fromMaybe "df" <$> resolve f "F"
+  g' <- fromMaybe "dg" <$> resolve g "G"
+  h' <- fromMaybe "dh" <$> resolve h "H"
+  i' <- fromMaybe "di" <$> resolve i "I"
+  pure $ B2 a' b' c' d' e' f' g' h' i'


=====================================
testsuite/tests/perf/compiler/all.T
=====================================
@@ -355,3 +355,9 @@ test('T16190',
       ['T16190.hs', '-v0'])
 
 test('T16473', normal, makefile_test, ['T16473'])
+
+test ('T18282',
+      [ collect_compiler_stats('bytes allocated',2)
+      ],
+      compile,
+      ['-v0 -O'])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/43c0ecd2d28a82f57e1430397b6ce0ff167d7278

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/43c0ecd2d28a82f57e1430397b6ce0ff167d7278
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200603/77a3ec19/attachment-0001.html>


More information about the ghc-commits mailing list