[Git][ghc/ghc][wip/T15304] simplifier: Kill off ufKeenessFactor

Fri Apr 3 23:07:56 UTC 2020


Ben Gamari pushed to branch wip/T15304 at Glasgow Haskell Compiler / GHC


Commits:
a4a056b1 by Ben Gamari at 2020-04-03T19:03:42-04:00
simplifier: Kill off ufKeenessFactor

We used to have another factor, ufKeenessFactor, which would scale the
discounts before they were subtracted from the size. This was justified
with the following comment:

  -- We multiple the raw discounts (args_discount and result_discount)
  -- ty opt_UnfoldingKeenessFactor because the former have to do with
  --  *size* whereas the discounts imply that there's some extra
  --  *efficiency* to be gained (e.g. beta reductions, case reductions)
  -- by inlining.

However, this is highly suspect since it means that we subtract a
*scaled* size from an absolute size, resulting in crazy (e.g. negative)
scores in some cases (#15304). We consequently killed off
ufKeenessFactor and bumped up the ufUseThreshold to compensate.

Adjustment of unfolding use threshold
=====================================

Since this removes a discount from our inlining heuristic, I revisited our
default choice of -funfolding-use-threshold to minimize the change in
overall inlining behavior. Specifically, I measured runtime allocations
and executable size of nofib and the testsuite performance tests built
using compilers (and core libraries) built with several values of
-funfolding-use-threshold.

This comes as a result of a quantitative comparison of testsuite
performance and code size as a function of ufUseThreshold, comparing
GHC trees using values of 50, 60, 70, 80, 90, and 100. The test set
consisted of nofib and the testsuite performance tests.
A full summary of these measurements are found in the description of
!2608

Comparing executable sizes (relative to the base commit) across all
nofib tests, we see that sizes are similar to the baseline:

            gmean      min      max   median
thresh
50         -6.36%   -7.04%   -4.82%   -6.46%
60         -5.04%   -5.97%   -3.83%   -5.11%
70         -2.90%   -3.84%   -2.31%   -2.92%
80         -0.75%   -2.16%   -0.42%   -0.73%
90         +0.24%   -0.41%   +0.55%   +0.26%
100        +1.36%   +0.80%   +1.64%   +1.37%
baseline   +0.00%   +0.00%   +0.00%   +0.00%

Likewise, looking at runtime allocations we see that 80 gives slightly
better optimisation than the baseline:

            gmean      min      max   median
thresh
50         +0.16%   -0.16%   +4.43%   +0.00%
60         +0.09%   -0.00%   +3.10%   +0.00%
70         +0.04%   -0.09%   +2.29%   +0.00%
80         +0.02%   -1.17%   +2.29%   +0.00%
90         -0.02%   -2.59%   +1.86%   +0.00%
100        +0.00%   -2.59%   +7.51%   -0.00%
baseline   +0.00%   +0.00%   +0.00%   +0.00%

Finally, I had to add a NOINLINE in T4306 to ensure that `upd` is
worker-wrappered as the test expects. This makes me wonder whether the
inlining heuristic is now too liberal as `upd` is quite a large
function. The same measure was taken in T12600.

             Wall clock time compiling Cabal with -O0
thresh       50     60     70     80     90      100    baseline
build-Cabal  93.88  89.58  92.59  90.09  100.26  94.81  89.13

Metric Decrease:
    T12234
    T13035
    T13719
    T14683
    T4801
    T5631
    T5642
    T9020
    T9872d
    T9961
Metric Increase:
    T12150
    T12425
    T13701
    T14697
    T15426
    T1969
    T3064
    T5837
    T9203
    T9872a
    T9872b
    T9872c
    T9872d
    haddock.Cabal
    haddock.base
    haddock.compiler

- - - - -


9 changed files:

- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Driver/Session.hs
- testsuite/tests/dependent/should_compile/dynamic-paper.stderr
- testsuite/tests/ghci.debugger/scripts/all.T
- testsuite/tests/ghci.debugger/scripts/break021.stdout
- testsuite/tests/perf/compiler/T16473.stdout
- testsuite/tests/simplCore/should_compile/T12600.hs
- testsuite/tests/simplCore/should_compile/T15056.stderr
- testsuite/tests/simplCore/should_compile/T4306.hs


Changes:

=====================================
compiler/GHC/Core/Unfold.hs
=====================================
@@ -1001,10 +1001,6 @@ ufUseThreshold
      At a call site, if the unfolding, less discounts, is smaller than
      this, then it's small enough inline
 
-ufKeenessFactor
-     Factor by which the discounts are multiplied before
-     subtracting from size
-
 ufDictDiscount
      The discount for each occurrence of a dictionary argument
      as an argument of a class method.  Should be pretty small
@@ -1023,6 +1019,22 @@ ufVeryAggressive
      loop breakers.
 
 
+Historical Note: Before April 2020 we had another factor,
+ufKeenessFactor, which would scale the discounts before they were subtracted
+from the size. This was justified with the following comment:
+
+  -- We multiply the raw discounts (args_discount and result_discount)
+  -- ty opt_UnfoldingKeenessFactor because the former have to do with
+  --  *size* whereas the discounts imply that there's some extra
+  --  *efficiency* to be gained (e.g. beta reductions, case reductions)
+  -- by inlining.
+
+However, this is highly suspect since it means that we subtract a *scaled* size
+from an absolute size, resulting in crazy (e.g. negative) scores in some cases
+(#15304). We consequently killed off ufKeenessFactor and bumped up the
+ufUseThreshold to compensate.
+
+
 Note [Function applications]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 In a function application (f a b)
@@ -1306,8 +1318,7 @@ tryUnfolding dflags id lone_variable
           extra_doc = text "discounted size =" <+> int discounted_size
           discounted_size = size - discount
           small_enough = discounted_size <= ufUseThreshold dflags
-          discount = computeDiscount dflags arg_discounts
-                                     res_discount arg_infos cont_info
+          discount = computeDiscount arg_discounts res_discount arg_infos cont_info
 
   where
     mk_doc some_benefit extra_doc yes_or_no
@@ -1552,14 +1563,9 @@ which Roman did.
 
 -}
 
-computeDiscount :: DynFlags -> [Int] -> Int -> [ArgSummary] -> CallCtxt
+computeDiscount :: [Int] -> Int -> [ArgSummary] -> CallCtxt
                 -> Int
-computeDiscount dflags arg_discounts res_discount arg_infos cont_info
-        -- We multiple the raw discounts (args_discount and result_discount)
-        -- ty opt_UnfoldingKeenessFactor because the former have to do with
-        --  *size* whereas the discounts imply that there's some extra
-        --  *efficiency* to be gained (e.g. beta reductions, case reductions)
-        -- by inlining.
+computeDiscount arg_discounts res_discount arg_infos cont_info
 
   = 10          -- Discount of 10 because the result replaces the call
                 -- so we count 10 for the function itself
@@ -1568,8 +1574,7 @@ computeDiscount dflags arg_discounts res_discount arg_infos cont_info
                -- Discount of 10 for each arg supplied,
                -- because the result replaces the call
 
-    + round (ufKeenessFactor dflags *
-             fromIntegral (total_arg_discount + res_discount'))
+    + total_arg_discount + res_discount'
   where
     actual_arg_discounts = zipWith mk_arg_discount arg_discounts arg_infos
     total_arg_discount   = sum actual_arg_discounts


=====================================
compiler/GHC/Driver/Session.hs
=====================================
@@ -699,7 +699,6 @@ data DynFlags = DynFlags {
   ufUseThreshold        :: Int,
   ufFunAppDiscount      :: Int,
   ufDictDiscount        :: Int,
-  ufKeenessFactor       :: Float,
   ufDearOp              :: Int,
   ufVeryAggressive      :: Bool,
 
@@ -1430,12 +1429,11 @@ defaultDynFlags mySettings llvmConfig =
         -- into Csg.calc (The unfolding for sqr never makes it into the
         -- interface file.)
         ufCreationThreshold = 750,
-        ufUseThreshold      = 60,
+        ufUseThreshold      = 80,
         ufFunAppDiscount    = 60,
         -- Be fairly keen to inline a function if that means
         -- we'll be able to pick the right method from a dictionary
         ufDictDiscount      = 30,
-        ufKeenessFactor     = 1.5,
         ufDearOp            = 40,
         ufVeryAggressive    = False,
 
@@ -3021,8 +3019,9 @@ dynamic_flags_deps = [
       (intSuffix   (\n d -> d {ufFunAppDiscount = n}))
   , make_ord_flag defFlag "funfolding-dict-discount"
       (intSuffix   (\n d -> d {ufDictDiscount = n}))
-  , make_ord_flag defFlag "funfolding-keeness-factor"
-      (floatSuffix (\n d -> d {ufKeenessFactor = n}))
+  , make_dep_flag defFlag "funfolding-keeness-factor"
+      (floatSuffix (\_ d -> d))
+      "-funfolding-keeness-factor is no longer respected as of GHC 8.12"
   , make_ord_flag defFlag "fmax-worker-args"
       (intSuffix (\n d -> d {maxWorkerArgs = n}))
   , make_ord_flag defGhciFlag "fghci-hist-size"


=====================================
testsuite/tests/dependent/should_compile/dynamic-paper.stderr
=====================================
@@ -12,4 +12,4 @@ Simplifier ticks exhausted
   simplifier non-termination has been judged acceptable.
    
   To see detailed counts use -ddump-simpl-stats
-  Total ticks: 140082
+  Total ticks: 140084


=====================================
testsuite/tests/ghci.debugger/scripts/all.T
=====================================
@@ -75,7 +75,8 @@ test('break015', expect_broken(1532), ghci_script, ['break015.script'])
 test('break016', combined_output, ghci_script, ['break016.script'])
 test('break017', [extra_files(['../QSort.hs']),
                   combined_output], ghci_script, ['break017.script'])
-test('break018', extra_files(['../mdo.hs']), ghci_script, ['break018.script'])
+test('break018', [expect_broken(18004), extra_files(['../mdo.hs'])],
+     ghci_script, ['break018.script'])
 test('break019', extra_files(['../Test2.hs']), ghci_script, ['break019.script'])
 test('break020', extra_files(['Break020b.hs']), ghci_script, ['break020.script'])
 test('break021', extra_files(['Break020b.hs', 'break020.hs']), ghci_script, ['break021.script'])


=====================================
testsuite/tests/ghci.debugger/scripts/break021.stdout
=====================================
@@ -41,7 +41,7 @@ _result :: IO () = _
       ^^^^^^^^^^^^^^^^^
 13    in_another_module 0
 Stopped in Main.in_another_decl, break020.hs:(6,21)-(7,30)
-_result :: m () = _
+_result :: IO () = _
 5  
                      vv
 6  in_another_decl _ = do line1 0
@@ -49,7 +49,7 @@ _result :: m () = _
                                  ^^
 8  
 Stopped in Main.in_another_decl, break020.hs:6:24-30
-_result :: m () = _
+_result :: IO () = _
 5  
 6  in_another_decl _ = do line1 0
                           ^^^^^^^
@@ -61,7 +61,7 @@ _result :: IO () = _
              ^^^^^^^^^
 4  line2 _ = return ()
 Stopped in Main.in_another_decl, break020.hs:7:24-30
-_result :: m () = _
+_result :: IO () = _
 6  in_another_decl _ = do line1 0
 7                         line2 0
                           ^^^^^^^


=====================================
testsuite/tests/perf/compiler/T16473.stdout
=====================================
@@ -64,73 +64,34 @@ Rule fired: Class op $p1Monad (BUILTIN)
 Rule fired: Class op pure (BUILTIN)
 Rule fired: ># (BUILTIN)
 Rule fired: ==# (BUILTIN)
-Rule fired: Class op >>= (BUILTIN)
-Rule fired: Class op >>= (BUILTIN)
 Rule fired: Class op $p1Monad (BUILTIN)
-Rule fired: Class op $p1Applicative (BUILTIN)
-Rule fired: SPEC/Main $fApplicativeStateT @Identity _ (Main)
-Rule fired: SPEC/Main $fMonadStateT_$c>>= @Identity _ (Main)
-Rule fired: SPEC/Main $fMonadStateT_$c>> @Identity _ (Main)
-Rule fired: Class op return (BUILTIN)
+Rule fired: Class op <*> (BUILTIN)
 Rule fired: Class op $p1Monad (BUILTIN)
 Rule fired: Class op $p1Applicative (BUILTIN)
-Rule fired: SPEC/Main $fApplicativeStateT @Identity _ (Main)
-Rule fired: SPEC/Main $fMonadStateT_$c>>= @Identity _ (Main)
-Rule fired: SPEC/Main $fMonadStateT_$c>> @Identity _ (Main)
-Rule fired: Class op return (BUILTIN)
-Rule fired: Class op >>= (BUILTIN)
-Rule fired: Class op >>= (BUILTIN)
 Rule fired: Class op >>= (BUILTIN)
-Rule fired: Class op >>= (BUILTIN)
-Rule fired: Class op $p1Monad (BUILTIN)
-Rule fired: Class op $p1Applicative (BUILTIN)
-Rule fired: SPEC/Main $fApplicativeStateT @Identity _ (Main)
 Rule fired: Class op $p1Monad (BUILTIN)
-Rule fired: Class op $p1Applicative (BUILTIN)
-Rule fired: SPEC/Main $fApplicativeStateT @Identity _ (Main)
+Rule fired: Class op pure (BUILTIN)
 Rule fired: Class op $p1Monad (BUILTIN)
 Rule fired: Class op $p1Applicative (BUILTIN)
+Rule fired: SPEC/Main $fMonadStateT_$c>>= @Identity _ (Main)
+Rule fired: SPEC/Main $fApplicativeStateT_$c<*> @Identity _ (Main)
+Rule fired: SPEC/Main $fApplicativeStateT_$cpure @Identity _ (Main)
+Rule fired: SPEC/Main $fFunctorStateT @Identity _ (Main)
+Rule fired: SPEC/Main $fFunctorStateT_$cfmap @Identity _ (Main)
+Rule fired: Class op fmap (BUILTIN)
+Rule fired: SPEC/Main $fFunctorStateT_$cfmap @Identity _ (Main)
+Rule fired: Class op fmap (BUILTIN)
+Rule fired: Class op fmap (BUILTIN)
+Rule fired: Class op fmap (BUILTIN)
 Rule fired: Class op fmap (BUILTIN)
 Rule fired: Class op fmap (BUILTIN)
 Rule fired: Class op return (BUILTIN)
 Rule fired: Class op return (BUILTIN)
 Rule fired: Class op >>= (BUILTIN)
 Rule fired: Class op >>= (BUILTIN)
-Rule fired: Class op return (BUILTIN)
 Rule fired: Class op >>= (BUILTIN)
 Rule fired: Class op >>= (BUILTIN)
 Rule fired: Class op return (BUILTIN)
-Rule fired: Class op fmap (BUILTIN)
-Rule fired: Class op fmap (BUILTIN)
 Rule fired: Class op >>= (BUILTIN)
 Rule fired: Class op >>= (BUILTIN)
-Rule fired: Class op fmap (BUILTIN)
-Rule fired: SPEC/Main $fFunctorStateT @Identity _ (Main)
-Rule fired: SPEC/Main $fApplicativeStateT_$cpure @Identity _ (Main)
-Rule fired: SPEC/Main $fApplicativeStateT_$c<*> @Identity _ (Main)
-Rule fired: Class op fmap (BUILTIN)
-Rule fired: SPEC/Main $fApplicativeStateT_$c*> @Identity _ (Main)
-Rule fired: Class op fmap (BUILTIN)
-Rule fired: SPEC/Main $fFunctorStateT @Identity _ (Main)
-Rule fired: SPEC/Main $fApplicativeStateT_$cpure @Identity _ (Main)
-Rule fired: SPEC/Main $fApplicativeStateT_$c<*> @Identity _ (Main)
-Rule fired: SPEC/Main $fApplicativeStateT_$c*> @Identity _ (Main)
-Rule fired: SPEC/Main $fMonadStateT @Identity _ (Main)
-Rule fired: Class op $p1Monad (BUILTIN)
-Rule fired: Class op <*> (BUILTIN)
-Rule fired: Class op $p1Monad (BUILTIN)
-Rule fired: Class op $p1Applicative (BUILTIN)
-Rule fired: Class op fmap (BUILTIN)
-Rule fired: Class op fmap (BUILTIN)
-Rule fired: Class op $p1Monad (BUILTIN)
-Rule fired: Class op <*> (BUILTIN)
-Rule fired: Class op $p1Monad (BUILTIN)
-Rule fired: Class op $p1Applicative (BUILTIN)
-Rule fired: Class op fmap (BUILTIN)
-Rule fired: Class op >>= (BUILTIN)
-Rule fired: Class op fmap (BUILTIN)
-Rule fired: SPEC go @(StateT (Sum Int) Identity) (Main)
-Rule fired: Class op $p1Monad (BUILTIN)
-Rule fired: Class op pure (BUILTIN)
-Rule fired: SPEC/Main $fMonadStateT @Identity _ (Main)
-Rule fired: SPEC go @(StateT (Sum Int) Identity) (Main)
+Rule fired: Class op return (BUILTIN)


=====================================
testsuite/tests/simplCore/should_compile/T12600.hs
=====================================
@@ -27,3 +27,4 @@ instance (Eq1 f) => Eq1 (G f) where
 
 foo :: G F Int -> G F Int -> Bool
 foo a b = eq1 a b
+{-# NOINLINE foo #-}


=====================================
testsuite/tests/simplCore/should_compile/T15056.stderr
=====================================
@@ -1,9 +1,7 @@
 Rule fired: Class op - (BUILTIN)
 Rule fired: Class op + (BUILTIN)
 Rule fired: Class op + (BUILTIN)
-Rule fired: Class op enumFromTo (BUILTIN)
-Rule fired: Class op foldr (BUILTIN)
-Rule fired: Class op foldr (BUILTIN)
 Rule fired: +# (BUILTIN)
 Rule fired: Class op foldr (BUILTIN)
+Rule fired: Class op enumFromTo (BUILTIN)
 Rule fired: fold/build (GHC.Base)


=====================================
testsuite/tests/simplCore/should_compile/T4306.hs
=====================================
@@ -10,3 +10,4 @@ upd (UPD _ (D x _)) = sqrt $! (x*x + x*x + sin x + x*x + x*x + cos x + x*x + x*x
                                x*x + x*x + sin x + x*x + x*x + cos x + x*x + x*x + tan x +
                                x*x + x*x + sin x + x*x + x*x + cos x + x*x + x*x + tan x)
                                -- make the rhs large enough to be worker/wrapperred
+{-# NOINLINE upd #-}



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/a4a056b13e4ed9c3721143d205d8920be47e475b

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/a4a056b13e4ed9c3721143d205d8920be47e475b
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200403/c6c78910/attachment-0001.html>