[commit: ghc] wip/spj-early-inline4: Add -fspec-constr-keen (f8a2547)

git at git.haskell.org git at git.haskell.org
Sun Feb 26 18:06:12 UTC 2017


Repository : ssh://git@git.haskell.org/ghc

On branch  : wip/spj-early-inline4
Link       : http://ghc.haskell.org/trac/ghc/changeset/f8a2547c8769bd04c3b7f06ec76514a86a1c24ff/ghc

>---------------------------------------------------------------

commit f8a2547c8769bd04c3b7f06ec76514a86a1c24ff
Author: Simon Peyton Jones <simonpj at microsoft.com>
Date:   Tue Feb 14 13:08:00 2017 +0000

    Add -fspec-constr-keen
    
    I dicovered that the dramatic imprvoement in perf/should_run/T9339
    with the introduction of join points was really rather a fluke, and
    very fragile.
    
    The real problem (see Note [Making SpecConstr keener]) is that
    SpecConstr wasn't specialising a function even though it was applied
    to a freshly-allocated constructor.  The paper describes plausible
    reasons for this, but I think it may well be better to be a bit more
    aggressive.
    
    So this patch add -fspec-constr-keen, which makes SpecConstr a bit
    keener to specialise, by ignoring whether or not the argument
    corresponding to a call pattern is scrutinised in the function body.
    Now the gains in T9339 should be robust; and it might even be a
    better default.
    
    I'd be interested in what happens if we switched on -fspec-constr-keen
    with -O2.


>---------------------------------------------------------------

f8a2547c8769bd04c3b7f06ec76514a86a1c24ff
 compiler/main/DynFlags.hs               |  2 ++
 compiler/specialise/SpecConstr.hs       | 57 +++++++++++++++++++++++++++++----
 docs/users_guide/using-optimisation.rst | 12 ++++++-
 testsuite/tests/perf/should_run/all.T   |  4 ++-
 4 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/compiler/main/DynFlags.hs b/compiler/main/DynFlags.hs
index d3d0ac3..442bbb9 100644
--- a/compiler/main/DynFlags.hs
+++ b/compiler/main/DynFlags.hs
@@ -432,6 +432,7 @@ data GeneralFlag
    | Opt_StgCSE
    | Opt_LiberateCase
    | Opt_SpecConstr
+   | Opt_SpecConstrKeen
    | Opt_DoLambdaEtaExpansion
    | Opt_IgnoreAsserts
    | Opt_DoEtaReduction
@@ -3684,6 +3685,7 @@ fFlagsDeps = [
    (useInstead "enable-rewrite-rules"),
   flagSpec "shared-implib"                    Opt_SharedImplib,
   flagSpec "spec-constr"                      Opt_SpecConstr,
+  flagSpec "spec-constr-keen"                 Opt_SpecConstrKeen,
   flagSpec "specialise"                       Opt_Specialise,
   flagSpec "specialize"                       Opt_Specialise,
   flagSpec "specialise-aggressively"          Opt_SpecialiseAggressively,
diff --git a/compiler/specialise/SpecConstr.hs b/compiler/specialise/SpecConstr.hs
index 8a3e227..a68955e 100644
--- a/compiler/specialise/SpecConstr.hs
+++ b/compiler/specialise/SpecConstr.hs
@@ -41,7 +41,8 @@ import VarEnv
 import VarSet
 import Name
 import BasicTypes
-import DynFlags         ( DynFlags(..), hasPprDebug )
+import DynFlags         ( DynFlags(..), GeneralFlag( Opt_SpecConstrKeen )
+                        , gopt, hasPprDebug )
 import Maybes           ( orElse, catMaybes, isJust, isNothing )
 import Demand
 import GHC.Serialized   ( deserializeWithData )
@@ -447,7 +448,6 @@ breaks an invariant.
 
 Note [Forcing specialisation]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
 With stream fusion and in other similar cases, we want to fully
 specialise some (but not necessarily all!) loops regardless of their
 size and the number of specialisations.
@@ -754,6 +754,39 @@ into a work-free value again, thus
    a'_shr = (a1, x_af7)
 but that's more work, so until its shown to be important I'm going to
 leave it for now.
+
+Note [Making SpecConstr keener]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider this, in (perf/should_run/T9339)
+   last (filter odd [1..1000])
+
+After optimisation, including SpecConstr, we get:
+   f :: Int# -> Int -> Int
+   f x y = case case remInt# x 2# of
+             __DEFAULT -> case x of
+                            __DEFAULT -> f (+# wild_Xp 1#) (I# x)
+                            1000000# -> ...
+             0# -> case x of
+                     __DEFAULT -> f (+# wild_Xp 1#) y
+                    1000000#   -> y
+
+Not good!  We build an (I# x) box every time around the loop.
+SpecConstr (as described in the paper) does not specialise f, despite
+the call (f ... (I# x)) because 'y' is not scrutinied in the body.
+But it is much better to specialise f for the case where the argument
+is of form (I# x); then we build the box only when returning y, which
+is on the cold path.
+
+Another exmaple:
+
+   f x = ...(g x)....
+
+Here 'x' is not scrutinised in f's body; but if we did specialise 'f'
+then the call (g x) might allow 'g' to be specialised in turn.
+
+So sc_keen controls whether or not we take account of whether argument is
+scrutinised in the body.  True <=> ignore that, and speicalise whenever
+the function is applied to a data constructor.
 -}
 
 data ScEnv = SCE { sc_dflags    :: DynFlags,
@@ -765,6 +798,11 @@ data ScEnv = SCE { sc_dflags    :: DynFlags,
                    sc_recursive :: Int,         -- Max # of specialisations over recursive type.
                                                 -- Stops ForceSpecConstr from diverging.
 
+                   sc_keen     :: Bool,         -- Specialise on arguments that are known
+                                                -- constructorss, even if they are not
+                                                -- scrutinised in the body.  See
+                                                -- Note [Making SpecConstr keener]
+
                    sc_force     :: Bool,        -- Force specialisation?
                                                 -- See Note [Forcing specialisation]
 
@@ -807,6 +845,7 @@ initScEnv dflags this_mod anns
           sc_size        = specConstrThreshold dflags,
           sc_count       = specConstrCount     dflags,
           sc_recursive   = specConstrRecursive dflags,
+          sc_keen        = gopt Opt_SpecConstrKeen dflags,
           sc_force       = False,
           sc_subst       = emptySubst,
           sc_how_bound   = emptyVarEnv,
@@ -1976,11 +2015,12 @@ argToPat env in_scope val_env arg arg_occ
                   mkConApp dc (ty_args ++ args')) }
   where
     mb_scrut dc = case arg_occ of
-                    ScrutOcc bs
-                           | Just occs <- lookupUFM bs dc
-                                          -> Just (occs)  -- See Note [Reboxing]
-                    _other | sc_force env -> Just (repeat UnkOcc)
-                           | otherwise    -> Nothing
+                    ScrutOcc bs | Just occs <- lookupUFM bs dc
+                                -> Just (occs)  -- See Note [Reboxing]
+                    _other      | sc_force env || sc_keen env
+                                -> Just (repeat UnkOcc)
+                                | otherwise
+                                -> Nothing
 
   -- Check if the argument is a variable that
   --    (a) is used in an interesting way in the function body
@@ -1989,6 +2029,9 @@ argToPat env in_scope val_env arg arg_occ
 argToPat env in_scope val_env (Var v) arg_occ
   | sc_force env || case arg_occ of { UnkOcc -> False; _other -> True }, -- (a)
     is_value,                                                            -- (b)
+       -- Ignoring sc_keen here to avoid gratuitously incurring Note [Reboxing]
+       -- So sc_keen focused just on f (I# x), where we have freshly-allocated
+       -- box that we can eliminate in the caller
     not (ignoreType env (varType v))
   = return (True, Var v)
   where
diff --git a/docs/users_guide/using-optimisation.rst b/docs/users_guide/using-optimisation.rst
index 9436832..e56c473 100644
--- a/docs/users_guide/using-optimisation.rst
+++ b/docs/users_guide/using-optimisation.rst
@@ -522,7 +522,7 @@ list.
 
     Turn on call-pattern specialisation; see `Call-pattern specialisation for
     Haskell programs
-    <http://research.microsoft.com/en-us/um/people/simonpj/papers/spec-constr/index.htm>`__.
+    <https://www.microsoft.com/en-us/research/publication/system-f-with-type-equality-coercions-2/>`__.
 
     This optimisation specializes recursive functions according to their
     argument "shapes". This is best explained by example so consider: ::
@@ -580,6 +580,16 @@ list.
     body directly, allowing heavy specialisation over the recursive
     cases.
 
+.. ghc-flag:: -fspec-constr-keen
+
+    :default: off
+
+    If this flag is on, call-patten specialision will specialise a call
+    ``(f (Just x))`` with an explicit constructor agument, even if the argument
+    is not scrutinised in the body of the function. This is sometimes
+    beneficial; e.g. the argument might be given to some other function
+    that can itself be specialised.
+
 .. ghc-flag:: -fspec-constr-count=<n>
 
     :default: 3
diff --git a/testsuite/tests/perf/should_run/all.T b/testsuite/tests/perf/should_run/all.T
index 6670f34..172d648 100644
--- a/testsuite/tests/perf/should_run/all.T
+++ b/testsuite/tests/perf/should_run/all.T
@@ -461,7 +461,9 @@ test('T9339',
                       # 2016-08-17:          50728 Join points (#12988)
       only_ways(['normal'])],
      compile_and_run,
-     ['-O2'])
+     ['-O2 -fspec-constr-keen'])
+     # For the -fspec-constr-keen see Note [Making SpecConstr keener] in SpecConstr
+
 
 test('T8472',
      [stats_num_field('bytes allocated',



More information about the ghc-commits mailing list