[Git][ghc/ghc][wip/dataToTag-opt] 5 commits: When deriving Eq always use tag based comparisons for nullary constructors

Ben Gamari gitlab at gitlab.haskell.org
Wed Mar 18 21:18:20 UTC 2020



Ben Gamari pushed to branch wip/dataToTag-opt at Glasgow Haskell Compiler / GHC


Commits:
6b6307be by Andreas Klebinger at 2020-03-18T04:28:39Z
When deriving Eq always use tag based comparisons for nullary constructors

- - - - -
19eabed9 by Andreas Klebinger at 2020-03-18T04:28:39Z
Use dataToTag# instead of getTag in deriving code.

getTag resides in base so is not useable in ghc-prim.
Where we need it.

- - - - -
ea6146a0 by Andreas Klebinger at 2020-03-18T04:34:32Z
Eliminate generated Con2Tag bindings completely

- - - - -
daec3a5f by Ben Gamari at 2020-03-18T13:39:05Z
Use pointer tag in dataToTag#

While looking at !2873 I noticed that dataToTag# previously didn't look
at a pointer's tag to determine its constructor. To be fair, there is a
bit of a trade-off here: using the pointer tag requires a bit more code
and another branch. On the other hand, it allows us to eliminate looking
at the info table in many cases (especially now since we tag large
constructor families; see #14373).

- - - - -
8f26e408 by Ben Gamari at 2020-03-18T21:17:36Z
Avoid unnecessary entry

- - - - -


4 changed files:

- compiler/GHC/StgToCmm/Expr.hs
- compiler/prelude/PrelNames.hs
- compiler/typecheck/TcGenDeriv.hs
- compiler/utils/ListSetOps.hs


Changes:

=====================================
compiler/GHC/StgToCmm/Expr.hs
=====================================
@@ -35,6 +35,7 @@ import GHC.Stg.Syntax
 import GHC.Cmm.Graph
 import GHC.Cmm.BlockId
 import GHC.Cmm hiding ( succ )
+import GHC.Cmm.Utils ( zeroExpr, cmmTagMask )
 import GHC.Cmm.Info
 import GHC.Core
 import DataCon
@@ -69,14 +70,44 @@ cgExpr (StgOpApp (StgPrimOp SeqOp) [StgVarArg a, _] _res_ty) =
   cgIdApp a []
 
 -- dataToTag# :: a -> Int#
--- See Note [dataToTag#] in primops.txt.pp
+-- See Note [dataToTag# magic] in PrelRules.
 cgExpr (StgOpApp (StgPrimOp DataToTagOp) [StgVarArg a] _res_ty) = do
   dflags <- getDynFlags
   emitComment (mkFastString "dataToTag#")
-  tmp <- newTemp (bWord dflags)
-  _ <- withSequel (AssignTo [tmp] False) (cgIdApp a [])
-  -- TODO: For small types look at the tag bits instead of reading info table
-  emitReturn [getConstrTag dflags (cmmUntag dflags (CmmReg (CmmLocal tmp)))]
+  info <- getCgIdInfo a
+  let amode = idInfoToAmode info
+  tag_reg <- assignTemp $ cmmConstrTag1 dflags amode
+  result_reg <- newTemp (bWord dflags)
+  let tag = CmmReg $ CmmLocal tag_reg
+      is_tagged = cmmNeWord dflags tag (zeroExpr dflags)
+      is_too_big_tag = cmmEqWord dflags tag (cmmTagMask dflags)
+  -- Here we will first check the tag bits of the pointer we were given;
+  -- if this doesn't work then enter the closure and use the info table
+  -- to determine the constructor. Note that all tag bits set means that
+  -- the constructor index is too large to fit in the pointer and therefore
+  -- we must look in the info table. See Note [Tagging big families].
+
+  slow_path <- getCode $ do
+      tmp <- newTemp (bWord dflags)
+      _ <- withSequel (AssignTo [tmp] False) (cgIdApp a [])
+      emitAssign (CmmLocal result_reg)
+        $ getConstrTag dflags (cmmUntag dflags (CmmReg (CmmLocal tmp)))
+
+  fast_path <- getCode $ do
+      -- Return the constructor index from the pointer tag
+      return_ptr_tag <- getCode $ do
+          emitAssign (CmmLocal result_reg)
+            $ cmmSubWord dflags tag (CmmLit $ mkWordCLit dflags 1)
+      -- Return the constructor index recorded in the info table
+      return_info_tag <- getCode $ do
+          emitAssign (CmmLocal result_reg)
+            $ getConstrTag dflags (cmmUntag dflags amode)
+
+      emit =<< mkCmmIfThenElse' is_too_big_tag return_info_tag return_ptr_tag (Just False)
+
+  emit =<< mkCmmIfThenElse' is_tagged fast_path slow_path (Just True)
+  emitReturn [CmmReg $ CmmLocal result_reg]
+
 
 cgExpr (StgOpApp op args ty) = cgOpApp op args ty
 cgExpr (StgConApp con args _)= cgConApp con args


=====================================
compiler/prelude/PrelNames.hs
=====================================
@@ -749,12 +749,13 @@ toList_RDR = nameRdrName toListName
 compose_RDR :: RdrName
 compose_RDR             = varQual_RDR gHC_BASE (fsLit ".")
 
-not_RDR, getTag_RDR, succ_RDR, pred_RDR, minBound_RDR, maxBound_RDR,
+not_RDR, getTag_RDR, dataToTag_RDR, succ_RDR, pred_RDR, minBound_RDR, maxBound_RDR,
     and_RDR, range_RDR, inRange_RDR, index_RDR,
     unsafeIndex_RDR, unsafeRangeSize_RDR :: RdrName
 and_RDR                 = varQual_RDR gHC_CLASSES (fsLit "&&")
 not_RDR                 = varQual_RDR gHC_CLASSES (fsLit "not")
 getTag_RDR              = varQual_RDR gHC_BASE (fsLit "getTag")
+dataToTag_RDR           = varQual_RDR gHC_PRIM (fsLit "dataToTag#")
 succ_RDR                = varQual_RDR gHC_ENUM (fsLit "succ")
 pred_RDR                = varQual_RDR gHC_ENUM (fsLit "pred")
 minBound_RDR            = varQual_RDR gHC_ENUM (fsLit "minBound")


=====================================
compiler/typecheck/TcGenDeriv.hs
=====================================
@@ -15,6 +15,7 @@ This is where we do all the grimy bindings' generation.
 {-# LANGUAGE CPP, ScopedTypeVariables #-}
 {-# LANGUAGE FlexibleContexts #-}
 {-# LANGUAGE TypeFamilies #-}
+{-# LANGUAGE DataKinds #-}
 
 {-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}
 
@@ -83,9 +84,8 @@ import Data.List  ( find, partition, intersperse )
 type BagDerivStuff = Bag DerivStuff
 
 data AuxBindSpec
-  = DerivCon2Tag TyCon  -- The con2Tag for given TyCon
-  | DerivTag2Con TyCon  -- ...ditto tag2Con
-  | DerivMaxTag  TyCon  -- ...and maxTag
+  = DerivTag2Con TyCon  -- The tag2Con for given TyCon
+  | DerivMaxTag  TyCon  -- ...and ditto maxTag
   deriving( Eq )
   -- All these generate ZERO-BASED tag operations
   -- I.e first constructor has tag 0
@@ -127,17 +127,17 @@ possibly zero of them).  Here's an example, with both \tr{N}ullary and
        case (a1 `eqFloat#` a2) of r -> r
   for that particular test.
 
-* If there are a lot of (more than ten) nullary constructors, we emit a
+* For nullary constructors, we emit a
   catch-all clause of the form:
 
-      (==) a b  = case (con2tag_Foo a) of { a# ->
-                  case (con2tag_Foo b) of { b# ->
+      (==) a b  = case (dataToTag# a) of { a# ->
+                  case (dataToTag# b) of { b# ->
                   case (a# ==# b#)     of {
                     r -> r }}}
 
-  If con2tag gets inlined this leads to join point stuff, so
-  it's better to use regular pattern matching if there aren't too
-  many nullary constructors.  "Ten" is arbitrary, of course
+  An older approach preferred regular pattern matches in some cases
+  but with dataToTag# forcing it's argument, and work on improving
+  join points, this seems no longer necessary.
 
 * If there aren't any nullary constructors, we emit a simpler
   catch-all:
@@ -146,7 +146,7 @@ possibly zero of them).  Here's an example, with both \tr{N}ullary and
 
 * For the @(/=)@ method, we normally just use the default method.
   If the type is an enumeration type, we could/may/should? generate
-  special code that calls @con2tag_Foo@, much like for @(==)@ shown
+  special code that calls @dataToTag#@, much like for @(==)@ shown
   above.
 
 We thought about doing this: If we're also deriving 'Ord' for this
@@ -162,20 +162,18 @@ produced don't get through the typechecker.
 gen_Eq_binds :: SrcSpan -> TyCon -> TcM (LHsBinds GhcPs, BagDerivStuff)
 gen_Eq_binds loc tycon = do
     dflags <- getDynFlags
-    return (method_binds dflags, aux_binds)
+    return (method_binds dflags, emptyBag)
   where
     all_cons = tyConDataCons tycon
     (nullary_cons, non_nullary_cons) = partition isNullarySrcDataCon all_cons
 
-    -- If there are ten or more (arbitrary number) nullary constructors,
-    -- use the con2tag stuff.  For small types it's better to use
-    -- ordinary pattern matching.
-    (tag_match_cons, pat_match_cons)
-       | nullary_cons `lengthExceeds` 10 = (nullary_cons, non_nullary_cons)
-       | otherwise                       = ([],           all_cons)
-
+    -- For nullary constructors, use the getTag stuff.
+    (tag_match_cons, pat_match_cons) = (nullary_cons, non_nullary_cons)
     no_tag_match_cons = null tag_match_cons
 
+    -- (LHS patterns, result)
+    fall_through_eqn :: DynFlags
+                     -> [([Located (Pat (GhcPass 'Parsed))] , LHsExpr GhcPs)]
     fall_through_eqn dflags
       | no_tag_match_cons   -- All constructors have arguments
       = case pat_match_cons of
@@ -186,17 +184,16 @@ gen_Eq_binds loc tycon = do
                  [([nlWildPat, nlWildPat], false_Expr)]
 
       | otherwise -- One or more tag_match cons; add fall-through of
-                  -- extract tags compare for equality
+                  -- extract tags compare for equality,
+                  -- The case `(C1 x) == (C1 y)` can no longer happen
+                  -- at this point as it's matched earlier.
       = [([a_Pat, b_Pat],
          untag_Expr dflags tycon [(a_RDR,ah_RDR), (b_RDR,bh_RDR)]
                     (genPrimOpApp (nlHsVar ah_RDR) eqInt_RDR (nlHsVar bh_RDR)))]
 
-    aux_binds | no_tag_match_cons = emptyBag
-              | otherwise         = unitBag $ DerivAuxBind $ DerivCon2Tag tycon
-
     method_binds dflags = unitBag (eq_bind dflags)
     eq_bind dflags = mkFunBindEC 2 loc eq_RDR (const true_Expr)
-                                 (map pats_etc pat_match_cons
+                                 ( map pats_etc pat_match_cons
                                    ++ fall_through_eqn dflags)
 
     ------------------------------------------------------------------
@@ -346,11 +343,8 @@ gen_Ord_binds loc tycon = do
       then ( unitBag $ mkFunBindEC 2 loc compare_RDR (const eqTag_Expr) []
            , emptyBag)
       else ( unitBag (mkOrdOp dflags OrdCompare) `unionBags` other_ops dflags
-           , aux_binds)
+           , emptyBag)
   where
-    aux_binds | single_con_type = emptyBag
-              | otherwise       = unitBag $ DerivAuxBind $ DerivCon2Tag tycon
-
         -- Note [Game plan for deriving Ord]
     other_ops dflags
       | (last_tag - first_tag) <= 2     -- 1-3 constructors
@@ -369,7 +363,7 @@ gen_Ord_binds loc tycon = do
 
     get_tag con = dataConTag con - fIRST_TAG
         -- We want *zero-based* tags, because that's what
-        -- con2Tag returns (generated by untag_Expr)!
+        -- dataToTag# returns (generated by untag_Expr)!
 
     tycon_data_cons = tyConDataCons tycon
     single_con_type = isSingleton tycon_data_cons
@@ -549,8 +543,8 @@ nlConWildPat con = noLoc (ConPatIn (noLoc (getRdrName con))
 data Foo ... = N1 | N2 | ... | Nn
 \end{verbatim}
 
-we use both @con2tag_Foo@ and @tag2con_Foo@ functions, as well as a
- at maxtag_Foo@ variable (all generated by @gen_tag_n_con_binds@).
+we use both dataToTag# and @tag2con_Foo@ functions, as well as a
+ at maxtag_Foo@ variable, the later generated by @gen_tag_n_con_binds.
 
 \begin{verbatim}
 instance ... Enum (Foo ...) where
@@ -563,16 +557,16 @@ instance ... Enum (Foo ...) where
 
     -- or, really...
     enumFrom a
-      = case con2tag_Foo a of
+      = case dataToTag# a of
           a# -> map tag2con_Foo (enumFromTo (I# a#) maxtag_Foo)
 
    enumFromThen a b
-     = map tag2con_Foo [con2tag_Foo a, con2tag_Foo b .. maxtag_Foo]
+     = map tag2con_Foo [dataToTag# a, dataToTag# b .. maxtag_Foo]
 
     -- or, really...
     enumFromThen a b
-      = case con2tag_Foo a of { a# ->
-        case con2tag_Foo b of { b# ->
+      = case dataToTag# a of { a# ->
+        case dataToTag# b of { b# ->
         map tag2con_Foo (enumFromThenTo (I# a#) (I# b#) maxtag_Foo)
         }}
 \end{verbatim}
@@ -594,7 +588,7 @@ gen_Enum_binds loc tycon = do
       , from_enum      dflags
       ]
     aux_binds = listToBag $ map DerivAuxBind
-                  [DerivCon2Tag tycon, DerivTag2Con tycon, DerivMaxTag tycon]
+                  [DerivTag2Con tycon, DerivMaxTag tycon]
 
     occ_nm = getOccString tycon
 
@@ -709,32 +703,32 @@ things go not too differently from @Enum@:
 \begin{verbatim}
 instance ... Ix (Foo ...) where
     range (a, b)
-      = map tag2con_Foo [con2tag_Foo a .. con2tag_Foo b]
+      = map tag2con_Foo [dataToTag# a .. dataToTag# b]
 
     -- or, really...
     range (a, b)
-      = case (con2tag_Foo a) of { a# ->
-        case (con2tag_Foo b) of { b# ->
+      = case (dataToTag# a) of { a# ->
+        case (dataToTag# b) of { b# ->
         map tag2con_Foo (enumFromTo (I# a#) (I# b#))
         }}
 
     -- Generate code for unsafeIndex, because using index leads
     -- to lots of redundant range tests
     unsafeIndex c@(a, b) d
-      = case (con2tag_Foo d -# con2tag_Foo a) of
+      = case (dataToTag# d -# dataToTag# a) of
                r# -> I# r#
 
     inRange (a, b) c
       = let
-            p_tag = con2tag_Foo c
+            p_tag = dataToTag# c
         in
-        p_tag >= con2tag_Foo a && p_tag <= con2tag_Foo b
+        p_tag >= dataToTag# a && p_tag <= dataToTag# b
 
     -- or, really...
     inRange (a, b) c
-      = case (con2tag_Foo a)   of { a_tag ->
-        case (con2tag_Foo b)   of { b_tag ->
-        case (con2tag_Foo c)   of { c_tag ->
+      = case (dataToTag# a)   of { a_tag ->
+        case (dataToTag# b)   of { b_tag ->
+        case (dataToTag# c)   of { c_tag ->
         if (c_tag >=# a_tag) then
           c_tag <=# b_tag
         else
@@ -757,8 +751,8 @@ gen_Ix_binds loc tycon = do
     dflags <- getDynFlags
     return $ if isEnumerationTyCon tycon
       then (enum_ixes dflags, listToBag $ map DerivAuxBind
-                   [DerivCon2Tag tycon, DerivTag2Con tycon, DerivMaxTag tycon])
-      else (single_con_ixes, unitBag (DerivAuxBind (DerivCon2Tag tycon)))
+                   [DerivTag2Con tycon, DerivMaxTag tycon])
+      else (single_con_ixes, emptyBag)
   where
     --------------------------------------------------------------
     enum_ixes dflags = listToBag
@@ -1937,41 +1931,18 @@ mkCoerceClassMethEqn cls inst_tvs inst_tys rhs_ty id
 \begin{verbatim}
 data Foo ... = ...
 
-con2tag_Foo :: Foo ... -> Int#
 tag2con_Foo :: Int -> Foo ...   -- easier if Int, not Int#
 maxtag_Foo  :: Int              -- ditto (NB: not unlifted)
 \end{verbatim}
 
 The `tags' here start at zero, hence the @fIRST_TAG@ (currently one)
 fiddling around.
+
+We also use dataToTag# heavily.
 -}
 
 genAuxBindSpec :: DynFlags -> SrcSpan -> AuxBindSpec
                   -> (LHsBind GhcPs, LSig GhcPs)
-genAuxBindSpec dflags loc (DerivCon2Tag tycon)
-  = (mkFunBindSE 0 loc rdr_name eqns,
-     L loc (TypeSig noExtField [L loc rdr_name] sig_ty))
-  where
-    rdr_name = con2tag_RDR dflags tycon
-
-    sig_ty = mkLHsSigWcType $ L loc $ XHsType $ NHsCoreTy $
-             mkSpecSigmaTy (tyConTyVars tycon) (tyConStupidTheta tycon) $
-             mkParentType tycon `mkVisFunTy` intPrimTy
-
-    lots_of_constructors = tyConFamilySize tycon > 8
-                        -- was: mAX_FAMILY_SIZE_FOR_VEC_RETURNS
-                        -- but we don't do vectored returns any more.
-
-    eqns | lots_of_constructors = [get_tag_eqn]
-         | otherwise = map mk_eqn (tyConDataCons tycon)
-
-    get_tag_eqn = ([nlVarPat a_RDR], nlHsApp (nlHsVar getTag_RDR) a_Expr)
-
-    mk_eqn :: DataCon -> ([LPat GhcPs], LHsExpr GhcPs)
-    mk_eqn con = ([nlWildConPat con],
-                  nlHsLit (HsIntPrim NoSourceText
-                                    (toInteger ((dataConTag con) - fIRST_TAG))))
-
 genAuxBindSpec dflags loc (DerivTag2Con tycon)
   = (mkFunBindSE 0 loc rdr_name
         [([nlConVarPat intDataCon_RDR [a_RDR]],
@@ -2254,14 +2225,26 @@ eq_Expr ty a b
  where
    (_, _, prim_eq, _, _) = primOrdOps "Eq" ty
 
-untag_Expr :: DynFlags -> TyCon -> [( RdrName,  RdrName)]
-              -> LHsExpr GhcPs -> LHsExpr GhcPs
+-- | Take an expression and a list of pairs @(exprName1,tagName1)@.
+-- Wraps the given expression in cases which bind tagName1 to the
+-- tag of exprName1 and so forth for all pairs and returns the
+-- resulting expression.
+untag_Expr :: DynFlags
+           -> TyCon
+           -> [( RdrName,  RdrName)] -- (expr, expr's tag bound to this)
+           -> LHsExpr GhcPs -- Final RHS
+           -> LHsExpr GhcPs -- Result expr
 untag_Expr _ _ [] expr = expr
 untag_Expr dflags tycon ((untag_this, put_tag_here) : more) expr
-  = nlHsCase (nlHsPar (nlHsVarApps (con2tag_RDR dflags tycon)
-                                   [untag_this])) {-of-}
+  {- case (dataToTag# untag_this) of
+        put_tag_here -> .... <recursive on more>
+          _ -> result
+         -}
+  = nlHsCase (nlHsPar (nlHsApp (nlHsVar dataToTag_RDR) (nlHsVar untag_this))) {-of-}
       [mkHsCaseAlt (nlVarPat put_tag_here) (untag_Expr dflags tycon more expr)]
 
+
+
 enum_from_to_Expr
         :: LHsExpr GhcPs -> LHsExpr GhcPs
         -> LHsExpr GhcPs
@@ -2372,9 +2355,8 @@ minusInt_RDR, tagToEnum_RDR :: RdrName
 minusInt_RDR  = getRdrName (primOpId IntSubOp   )
 tagToEnum_RDR = getRdrName (primOpId TagToEnumOp)
 
-con2tag_RDR, tag2con_RDR, maxtag_RDR :: DynFlags -> TyCon -> RdrName
+tag2con_RDR, maxtag_RDR :: DynFlags -> TyCon -> RdrName
 -- Generates Orig s RdrName, for the binding positions
-con2tag_RDR dflags tycon = mk_tc_deriv_name dflags tycon mkCon2TagOcc
 tag2con_RDR dflags tycon = mk_tc_deriv_name dflags tycon mkTag2ConOcc
 maxtag_RDR  dflags tycon = mk_tc_deriv_name dflags tycon mkMaxTagOcc
 
@@ -2403,13 +2385,15 @@ mkAuxBinderName dflags parent occ_fun
 {-
 Note [Auxiliary binders]
 ~~~~~~~~~~~~~~~~~~~~~~~~
-We often want to make a top-level auxiliary binding.  E.g. for comparison we have
 
-  instance Ord T where
-    compare a b = $con2tag a `compare` $con2tag b
+We often want to make a top-level auxiliary binding.  E.g. for enum we
+turn a Integer into a constructor. So we have
+
+  instance Enum T where
+    succ x = $tag2con (dataToTag x + 1)
 
-  $con2tag :: T -> Int
-  $con2tag = ...code....
+  $tag2con :: Int -> T
+  $tag2con = ...code....
 
 Of course these top-level bindings should all have distinct name, and we are
 generating RdrNames here.  We can't just use the TyCon or DataCon to distinguish


=====================================
compiler/utils/ListSetOps.hs
=====================================
@@ -104,12 +104,13 @@ minusList xs ys = filter (`S.notMember` yss) xs
 Inefficient finite maps based on association lists and equality.
 -}
 
--- A finite mapping based on equality and association lists
+-- | A finite mapping based on equality and association lists.
 type Assoc a b = [(a,b)]
 
 assoc             :: (Eq a) => String -> Assoc a b -> a -> b
 assocDefault      :: (Eq a) => b -> Assoc a b -> a -> b
 assocUsing        :: (a -> a -> Bool) -> String -> Assoc a b -> a -> b
+-- | Lookup key, fail gracefully using Nothing if not found.
 assocMaybe        :: (Eq a) => Assoc a b -> a -> Maybe b
 assocDefaultUsing :: (a -> a -> Bool) -> b -> Assoc a b -> a -> b
 



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/compare/da41a337971df016e789cf78346013f3461d7289...8f26e408397dcb09d994bda807750a1de07f6a48

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/compare/da41a337971df016e789cf78346013f3461d7289...8f26e408397dcb09d994bda807750a1de07f6a48
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200318/541d82bd/attachment-0001.html>


More information about the ghc-commits mailing list