[Git][ghc/ghc][wip/T18282] 2 commits: Reduce result discount in conSize

Ben Gamari gitlab at gitlab.haskell.org
Fri Jun 5 18:54:38 UTC 2020



Ben Gamari pushed to branch wip/T18282 at Glasgow Haskell Compiler / GHC


Commits:
42bdb3a8 by Simon Peyton Jones at 2020-06-05T14:54:17-04:00
Reduce result discount in conSize

Ticket #18282 showed that the result discount given by conSize
was massively too large.  This patch reduces that discount to
a constant 10, which just balances the cost of the constructor
application itself.

Note [Constructor size and result discount] elaborates, as
does the ticket #18282.

Reducing result discount reduces inlining, which affects perf.  I
found that I could increase the unfoldingUseThrehold from 80 to 90 in
compensation; in combination with the result discount change I get
these overall nofib numbers:

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
          boyer          -0.3%     +5.4%     +0.7%     +1.0%      0.0%
       cichelli          -0.3%     +5.9%     -9.9%     -9.5%      0.0%
      compress2          -0.4%     +9.6%     +7.2%     +6.4%      0.0%
    constraints          -0.3%     +0.2%     -3.0%     -3.4%      0.0%
   cryptarithm2          -0.3%     -3.9%     -2.2%     -2.4%      0.0%
         gamteb          -0.4%     +2.5%     +2.8%     +2.8%      0.0%
           life          -0.3%     -2.2%     -4.7%     -4.9%      0.0%
           lift          -0.3%     -0.3%     -0.8%     -0.5%      0.0%
         linear          -0.3%     -0.1%     -4.1%     -4.5%      0.0%
           mate          -0.2%     +1.4%     -2.2%     -1.9%    -14.3%
         parser          -0.3%     -2.1%     -5.4%     -4.6%      0.0%
         puzzle          -0.3%     +2.1%     -6.6%     -6.3%      0.0%
         simple          -0.4%     +2.8%     -3.4%     -3.3%     -2.2%
        veritas          -0.1%     +0.7%     -0.6%     -1.1%      0.0%
   wheel-sieve2          -0.3%    -19.2%    -24.9%    -24.5%    -42.9%
--------------------------------------------------------------------------------
            Min          -0.4%    -19.2%    -24.9%    -24.5%    -42.9%
            Max          +0.1%     +9.6%     +7.2%     +6.4%    +33.3%
 Geometric Mean          -0.3%     -0.0%     -3.0%     -2.9%     -0.3%

I'm ok with these numbers, remembering that this change removes
an *exponential* increase in code size in some in-the-wild cases.

I investigated compress2.  The difference is entirely caused by this
function no longer inlining

WriteRoutines.$woutputCodes
  = \ (w :: [CodeEvent]) ->
      let result_s1Sr
            = case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of
                (# ww1, ww2 #) -> (ww1, ww2)
      in (# case result_s1Sr of (x, _) ->
              map @Int @Char WriteRoutines.outputCodes1 x
         , case result_s1Sr of { (_, y) -> y } #)

It was right on the cusp before, driven by the excessive result
discount.  Too bad!

Metric Decrease:
    T12227
    T12545
    T15263
    T1969
    T5030
    T9872a
    T9872c
Metric Increase:
    T13701
    T9872d

- - - - -
7d7c2dc7 by Simon Peyton Jones at 2020-06-05T14:54:30-04:00
Perf wibbles

Document before committing

- - - - -


6 changed files:

- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Core/Unify.hs
- compiler/GHC/Driver/Session.hs
- + testsuite/tests/perf/compiler/T18282.hs
- testsuite/tests/perf/compiler/all.T


Changes:

=====================================
compiler/GHC/Core/Coercion.hs
=====================================
@@ -2834,7 +2834,7 @@ simplifyArgsWorker orig_ki_binders orig_inner_ki orig_fvs
        -> [Role]      -- Roles at which to flatten these ...
        -> [(Type, Coercion)]  -- flattened arguments, with their flattening coercions
        -> ([Type], [Coercion], CoercionN)
-    go acc_xis acc_cos lc binders inner_ki _ []
+    go acc_xis acc_cos !lc binders inner_ki _ []
       = (reverse acc_xis, reverse acc_cos, kind_co)
       where
         final_kind = mkPiTys binders inner_ki


=====================================
compiler/GHC/Core/Unfold.hs
=====================================
@@ -871,16 +871,13 @@ conSize dc n_val_args
   | n_val_args == 0 = SizeIs 0 emptyBag 10    -- Like variables
 
 -- See Note [Unboxed tuple size and result discount]
-  | isUnboxedTupleCon dc = SizeIs 0 emptyBag (10 * (1 + n_val_args))
+  | isUnboxedTupleCon dc = SizeIs 0 emptyBag 10
 
 -- See Note [Constructor size and result discount]
-  | otherwise = SizeIs 10 emptyBag (10 * (1 + n_val_args))
+  | otherwise = SizeIs 10 emptyBag 10
 
--- XXX still looks to large to me
-
-{-
-Note [Constructor size and result discount]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+{- Note [Constructor size and result discount]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Treat a constructors application as size 10, regardless of how many
 arguments it has; we are keen to expose them (and we charge separately
 for their args).  We can't treat them as size zero, else we find that
@@ -891,14 +888,32 @@ The "result discount" is applied if the result of the call is
 scrutinised (say by a case).  For a constructor application that will
 mean the constructor application will disappear, so we don't need to
 charge it to the function.  So the discount should at least match the
-cost of the constructor application, namely 10.  But to give a bit
-of extra incentive we give a discount of 10*(1 + n_val_args).
-
-Simon M tried a MUCH bigger discount: (10 * (10 + n_val_args)),
-and said it was an "unambiguous win", but its terribly dangerous
-because a function with many many case branches, each finishing with
-a constructor, can have an arbitrarily large discount.  This led to
-terrible code bloat: see #6099.
+cost of the constructor application, namely 10.
+
+Historical note 1: Until Jun 2020 we gave it a "bit of extra
+incentive" via a discount of 10*(1 + n_val_args), but that was FAR too
+much (#18282).  In particular, consider a huge case tree like
+
+   let r = case y1 of
+          Nothing -> B1 a b c
+          Just v1 -> case y2 of
+                      Nothing -> B1 c b a
+                      Just v2 -> ...
+
+If conSize gives a cost of 10 (regardless of n_val_args) and a
+discount of 10, that'll make each alternative RHS cost zero.  We
+charge 10 for each case alternative (see size_up_alt).  If we give a
+bigger discount (say 20) in conSize, we'll make the case expression
+cost *nothing*, and that can make a huge case tree cost nothing. This
+leads to massive, sometimes exponenial inlinings (#18282).  In short,
+don't give a discount that give a negative size to a sub-expression!
+
+Historical note 2: Much longer ago, Simon M tried a MUCH bigger
+discount: (10 * (10 + n_val_args)), and said it was an "unambiguous
+win", but its terribly dangerous because a function with many many
+case branches, each finishing with a constructor, can have an
+arbitrarily large discount.  This led to terrible code bloat: see
+#6099.
 
 Note [Unboxed tuple size and result discount]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -908,7 +923,7 @@ and f wasn't getting inlined.
 
 I tried giving unboxed tuples a *result discount* of zero (see the
 commented-out line).  Why?  When returned as a result they do not
-allocate, so maybe we don't want to charge so much for them If you
+allocate, so maybe we don't want to charge so much for them. If you
 have a non-zero discount here, we find that workers often get inlined
 back into wrappers, because it look like
     f x = case $wf x of (# a,b #) -> (a,b)
@@ -917,6 +932,9 @@ shrank binary sizes by 0.5% it also made spectral/boyer allocate 5%
 more. All other changes were very small. So it's not a big deal but I
 didn't adopt the idea.
 
+When fixing #18282 (see Note [Constructor size and result discount])
+I changed the result discount to be just 10, not 10*(1+n_val_args).
+
 Note [Function and non-function discounts]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We want a discount if the function is applied. A good example is


=====================================
compiler/GHC/Core/Unify.hs
=====================================
@@ -1,6 +1,6 @@
 -- (c) The University of Glasgow 2006
 
-{-# LANGUAGE ScopedTypeVariables #-}
+{-# LANGUAGE ScopedTypeVariables, PatternSynonyms, BangPatterns #-}
 {-# LANGUAGE CPP #-}
 {-# LANGUAGE DeriveFunctor #-}
 
@@ -44,6 +44,7 @@ import GHC.Data.Pair
 import GHC.Utils.Outputable
 import GHC.Types.Unique.FM
 import GHC.Types.Unique.Set
+import GHC.Exts( oneShot )
 
 import Control.Monad
 import Control.Applicative hiding ( empty )
@@ -1235,8 +1236,14 @@ data UMState = UMState
                    { um_tv_env   :: TvSubstEnv
                    , um_cv_env   :: CvSubstEnv }
 
-newtype UM a = UM { unUM :: UMState -> UnifyResultM (UMState, a) }
-    deriving (Functor)
+newtype UM a
+  = UMNoEta { unUM :: UMState -> UnifyResultM (UMState, a) }
+  deriving (Functor)
+
+pattern UM :: (UMState -> UnifyResultM (UMState, a)) -> UM a
+pattern UM m <- UMNoEta m
+  where
+    UM m = UMNoEta (oneShot m)
 
 instance Applicative UM where
       pure a = UM (\s -> pure (s, a))


=====================================
compiler/GHC/Driver/Session.hs
=====================================
@@ -1422,16 +1422,21 @@ defaultDynFlags mySettings llvmConfig =
         extensions = [],
         extensionFlags = flattenExtensionFlags Nothing [],
 
-        -- The ufCreationThreshold threshold must be reasonably high to
-        -- take account of possible discounts.
-        -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to inline
-        -- into Csg.calc (The unfolding for sqr never makes it into the
-        -- interface file.)
         ufCreationThreshold = 750,
-        ufUseThreshold      = 80,
-        ufFunAppDiscount    = 60,
-        -- Be fairly keen to inline a function if that means
-        -- we'll be able to pick the right method from a dictionary
+           -- The ufCreationThreshold threshold must be reasonably high
+           -- to take account of possible discounts.
+           -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to
+           -- inline into Csg.calc (The unfolding for sqr never makes it
+           -- into the interface file.)
+
+        ufUseThreshold = 90,
+           -- Last adjusted upwards in #18282, when I reduced
+           -- the result discount for constructors.
+
+        ufFunAppDiscount = 60,
+           -- Be fairly keen to inline a function if that means
+           -- we'll be able to pick the right method from a dictionary
+
         ufDictDiscount      = 30,
         ufDearOp            = 40,
         ufVeryAggressive    = False,


=====================================
testsuite/tests/perf/compiler/T18282.hs
=====================================
@@ -0,0 +1,41 @@
+module M
+  ( mkB2
+  ) where
+
+import Control.Monad.Reader
+import Data.Maybe
+
+data A1 = A1 (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+data A2 = A2 A1 (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+                (Maybe String) (Maybe String) (Maybe String) (Maybe String)
+
+data B1 = B1 !String !String !String !String
+data B2 = B2 !B1 !String !String !String !String !String !String !String !String
+--           a   b       c       d       e       f       g       h       i
+
+type M a = ReaderT [(String, String)] (Either String) a
+
+resolve :: Maybe String -> String -> M (Maybe String)
+resolve (Just x) _ = pure (Just x)
+resolve Nothing  v = asks $ lookup v
+
+mkB1 :: A1 -> M B1
+mkB1 (A1 a b c d) = do
+  a' <- fromMaybe "" <$> resolve a "A"
+  b' <- fromMaybe "" <$> resolve b "B"
+  c' <- fromMaybe "" <$> resolve c "C"
+  d' <- fromMaybe "" <$> resolve d "D"
+  pure $ B1 a' b' c' d'
+
+mkB2 :: A2 -> M B2
+mkB2 (A2 a b c d e f g h i) = do
+  a' <- mkB1 a
+  b' <- fromMaybe "db" <$> resolve b "B"
+  c' <- fromMaybe "dc" <$> resolve c "C"
+  d' <- fromMaybe "dd" <$> resolve d "D"
+  e' <- fromMaybe "de" <$> resolve e "E"
+  f' <- fromMaybe "df" <$> resolve f "F"
+  g' <- fromMaybe "dg" <$> resolve g "G"
+  h' <- fromMaybe "dh" <$> resolve h "H"
+  i' <- fromMaybe "di" <$> resolve i "I"
+  pure $ B2 a' b' c' d' e' f' g' h' i'


=====================================
testsuite/tests/perf/compiler/all.T
=====================================
@@ -355,3 +355,9 @@ test('T16190',
       ['T16190.hs', '-v0'])
 
 test('T16473', normal, makefile_test, ['T16473'])
+
+test ('T18282',
+      [ collect_compiler_stats('bytes allocated',2)
+      ],
+      compile,
+      ['-v0 -O'])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/d087308095a7697dea08c8478f4cd795a0512804...7d7c2dc7fceee763b4f292466488892cf4b6ca5a

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/d087308095a7697dea08c8478f4cd795a0512804...7d7c2dc7fceee763b4f292466488892cf4b6ca5a
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200605/c2ffda09/attachment-0001.html>


More information about the ghc-commits mailing list