[Git][ghc/ghc][wip/dataToTag-opt] 3 commits: When deriving Eq always use tag based comparisons for nullary constructors

Ben Gamari gitlab at gitlab.haskell.org
Wed Mar 18 04:26:19 UTC 2020



Ben Gamari pushed to branch wip/dataToTag-opt at Glasgow Haskell Compiler / GHC


Commits:
05576ed5 by Andreas Klebinger at 2020-03-18T04:24:30Z
When deriving Eq always use tag based comparisons for nullary constructors

- - - - -
69ce4a89 by Andreas Klebinger at 2020-03-18T04:24:35Z
Use dataToTag# instead of getTag in deriving code.

getTag resides in base so is not useable in ghc-prim.
Where we need it.

- - - - -
4a2cf862 by Andreas Klebinger at 2020-03-18T04:24:39Z
Eliminate generated Con2Tag bindings completely

- - - - -


3 changed files:

- compiler/prelude/PrelNames.hs
- compiler/typecheck/TcGenDeriv.hs
- compiler/utils/ListSetOps.hs


Changes:

=====================================
compiler/prelude/PrelNames.hs
=====================================
@@ -749,12 +749,13 @@ toList_RDR = nameRdrName toListName
 compose_RDR :: RdrName
 compose_RDR             = varQual_RDR gHC_BASE (fsLit ".")
 
-not_RDR, getTag_RDR, succ_RDR, pred_RDR, minBound_RDR, maxBound_RDR,
+not_RDR, getTag_RDR, dataToTag_RDR, succ_RDR, pred_RDR, minBound_RDR, maxBound_RDR,
     and_RDR, range_RDR, inRange_RDR, index_RDR,
     unsafeIndex_RDR, unsafeRangeSize_RDR :: RdrName
 and_RDR                 = varQual_RDR gHC_CLASSES (fsLit "&&")
 not_RDR                 = varQual_RDR gHC_CLASSES (fsLit "not")
 getTag_RDR              = varQual_RDR gHC_BASE (fsLit "getTag")
+dataToTag_RDR           = varQual_RDR gHC_PRIM (fsLit "dataToTag#")
 succ_RDR                = varQual_RDR gHC_ENUM (fsLit "succ")
 pred_RDR                = varQual_RDR gHC_ENUM (fsLit "pred")
 minBound_RDR            = varQual_RDR gHC_ENUM (fsLit "minBound")


=====================================
compiler/typecheck/TcGenDeriv.hs
=====================================
@@ -15,6 +15,7 @@ This is where we do all the grimy bindings' generation.
 {-# LANGUAGE CPP, ScopedTypeVariables #-}
 {-# LANGUAGE FlexibleContexts #-}
 {-# LANGUAGE TypeFamilies #-}
+{-# LANGUAGE DataKinds #-}
 
 {-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}
 
@@ -83,9 +84,8 @@ import Data.List  ( find, partition, intersperse )
 type BagDerivStuff = Bag DerivStuff
 
 data AuxBindSpec
-  = DerivCon2Tag TyCon  -- The con2Tag for given TyCon
-  | DerivTag2Con TyCon  -- ...ditto tag2Con
-  | DerivMaxTag  TyCon  -- ...and maxTag
+  = DerivTag2Con TyCon  -- The tag2Con for given TyCon
+  | DerivMaxTag  TyCon  -- ...and ditto maxTag
   deriving( Eq )
   -- All these generate ZERO-BASED tag operations
   -- I.e first constructor has tag 0
@@ -127,17 +127,17 @@ possibly zero of them).  Here's an example, with both \tr{N}ullary and
        case (a1 `eqFloat#` a2) of r -> r
   for that particular test.
 
-* If there are a lot of (more than ten) nullary constructors, we emit a
+* For nullary constructors, we emit a
   catch-all clause of the form:
 
-      (==) a b  = case (con2tag_Foo a) of { a# ->
-                  case (con2tag_Foo b) of { b# ->
+      (==) a b  = case (dataToTag# a) of { a# ->
+                  case (dataToTag# b) of { b# ->
                   case (a# ==# b#)     of {
                     r -> r }}}
 
-  If con2tag gets inlined this leads to join point stuff, so
-  it's better to use regular pattern matching if there aren't too
-  many nullary constructors.  "Ten" is arbitrary, of course
+  An older approach preferred regular pattern matches in some cases
+  but with dataToTag# forcing it's argument, and work on improving
+  join points this seems no longer necessary.
 
 * If there aren't any nullary constructors, we emit a simpler
   catch-all:
@@ -146,7 +146,7 @@ possibly zero of them).  Here's an example, with both \tr{N}ullary and
 
 * For the @(/=)@ method, we normally just use the default method.
   If the type is an enumeration type, we could/may/should? generate
-  special code that calls @con2tag_Foo@, much like for @(==)@ shown
+  special code that calls @dataToTag#@, much like for @(==)@ shown
   above.
 
 We thought about doing this: If we're also deriving 'Ord' for this
@@ -162,20 +162,18 @@ produced don't get through the typechecker.
 gen_Eq_binds :: SrcSpan -> TyCon -> TcM (LHsBinds GhcPs, BagDerivStuff)
 gen_Eq_binds loc tycon = do
     dflags <- getDynFlags
-    return (method_binds dflags, aux_binds)
+    return (method_binds dflags, emptyBag)
   where
     all_cons = tyConDataCons tycon
     (nullary_cons, non_nullary_cons) = partition isNullarySrcDataCon all_cons
 
-    -- If there are ten or more (arbitrary number) nullary constructors,
-    -- use the con2tag stuff.  For small types it's better to use
-    -- ordinary pattern matching.
-    (tag_match_cons, pat_match_cons)
-       | nullary_cons `lengthExceeds` 10 = (nullary_cons, non_nullary_cons)
-       | otherwise                       = ([],           all_cons)
-
+    -- For nullary constructors, use the getTag stuff.
+    (tag_match_cons, pat_match_cons) = (nullary_cons, non_nullary_cons)
     no_tag_match_cons = null tag_match_cons
 
+    -- (LHS patterns, result)
+    fall_through_eqn :: DynFlags
+                     -> [([Located (Pat (GhcPass 'Parsed))] , LHsExpr GhcPs)]
     fall_through_eqn dflags
       | no_tag_match_cons   -- All constructors have arguments
       = case pat_match_cons of
@@ -186,17 +184,16 @@ gen_Eq_binds loc tycon = do
                  [([nlWildPat, nlWildPat], false_Expr)]
 
       | otherwise -- One or more tag_match cons; add fall-through of
-                  -- extract tags compare for equality
+                  -- extract tags compare for equality,
+                  -- The case `(C1 x) == (C1 y)` can no longer happen
+                  -- at this point as it's matched earlier.
       = [([a_Pat, b_Pat],
          untag_Expr dflags tycon [(a_RDR,ah_RDR), (b_RDR,bh_RDR)]
                     (genPrimOpApp (nlHsVar ah_RDR) eqInt_RDR (nlHsVar bh_RDR)))]
 
-    aux_binds | no_tag_match_cons = emptyBag
-              | otherwise         = unitBag $ DerivAuxBind $ DerivCon2Tag tycon
-
     method_binds dflags = unitBag (eq_bind dflags)
     eq_bind dflags = mkFunBindEC 2 loc eq_RDR (const true_Expr)
-                                 (map pats_etc pat_match_cons
+                                 ( map pats_etc pat_match_cons
                                    ++ fall_through_eqn dflags)
 
     ------------------------------------------------------------------
@@ -346,11 +343,8 @@ gen_Ord_binds loc tycon = do
       then ( unitBag $ mkFunBindEC 2 loc compare_RDR (const eqTag_Expr) []
            , emptyBag)
       else ( unitBag (mkOrdOp dflags OrdCompare) `unionBags` other_ops dflags
-           , aux_binds)
+           , emptyBag)
   where
-    aux_binds | single_con_type = emptyBag
-              | otherwise       = unitBag $ DerivAuxBind $ DerivCon2Tag tycon
-
         -- Note [Game plan for deriving Ord]
     other_ops dflags
       | (last_tag - first_tag) <= 2     -- 1-3 constructors
@@ -369,7 +363,7 @@ gen_Ord_binds loc tycon = do
 
     get_tag con = dataConTag con - fIRST_TAG
         -- We want *zero-based* tags, because that's what
-        -- con2Tag returns (generated by untag_Expr)!
+        -- dataToTag# returns (generated by untag_Expr)!
 
     tycon_data_cons = tyConDataCons tycon
     single_con_type = isSingleton tycon_data_cons
@@ -549,8 +543,8 @@ nlConWildPat con = noLoc (ConPatIn (noLoc (getRdrName con))
 data Foo ... = N1 | N2 | ... | Nn
 \end{verbatim}
 
-we use both @con2tag_Foo@ and @tag2con_Foo@ functions, as well as a
- at maxtag_Foo@ variable (all generated by @gen_tag_n_con_binds@).
+we use both dataToTag# and @tag2con_Foo@ functions, as well as a
+ at maxtag_Foo@ variable, the later generated by @gen_tag_n_con_binds.
 
 \begin{verbatim}
 instance ... Enum (Foo ...) where
@@ -563,16 +557,16 @@ instance ... Enum (Foo ...) where
 
     -- or, really...
     enumFrom a
-      = case con2tag_Foo a of
+      = case dataToTag# a of
           a# -> map tag2con_Foo (enumFromTo (I# a#) maxtag_Foo)
 
    enumFromThen a b
-     = map tag2con_Foo [con2tag_Foo a, con2tag_Foo b .. maxtag_Foo]
+     = map tag2con_Foo [dataToTag# a, dataToTag# b .. maxtag_Foo]
 
     -- or, really...
     enumFromThen a b
-      = case con2tag_Foo a of { a# ->
-        case con2tag_Foo b of { b# ->
+      = case dataToTag# a of { a# ->
+        case dataToTag# b of { b# ->
         map tag2con_Foo (enumFromThenTo (I# a#) (I# b#) maxtag_Foo)
         }}
 \end{verbatim}
@@ -594,7 +588,7 @@ gen_Enum_binds loc tycon = do
       , from_enum      dflags
       ]
     aux_binds = listToBag $ map DerivAuxBind
-                  [DerivCon2Tag tycon, DerivTag2Con tycon, DerivMaxTag tycon]
+                  [DerivTag2Con tycon, DerivMaxTag tycon]
 
     occ_nm = getOccString tycon
 
@@ -709,32 +703,32 @@ things go not too differently from @Enum@:
 \begin{verbatim}
 instance ... Ix (Foo ...) where
     range (a, b)
-      = map tag2con_Foo [con2tag_Foo a .. con2tag_Foo b]
+      = map tag2con_Foo [dataToTag# a .. dataToTag# b]
 
     -- or, really...
     range (a, b)
-      = case (con2tag_Foo a) of { a# ->
-        case (con2tag_Foo b) of { b# ->
+      = case (dataToTag# a) of { a# ->
+        case (dataToTag# b) of { b# ->
         map tag2con_Foo (enumFromTo (I# a#) (I# b#))
         }}
 
     -- Generate code for unsafeIndex, because using index leads
     -- to lots of redundant range tests
     unsafeIndex c@(a, b) d
-      = case (con2tag_Foo d -# con2tag_Foo a) of
+      = case (dataToTag# d -# dataToTag# a) of
                r# -> I# r#
 
     inRange (a, b) c
       = let
-            p_tag = con2tag_Foo c
+            p_tag = dataToTag# c
         in
-        p_tag >= con2tag_Foo a && p_tag <= con2tag_Foo b
+        p_tag >= dataToTag# a && p_tag <= dataToTag# b
 
     -- or, really...
     inRange (a, b) c
-      = case (con2tag_Foo a)   of { a_tag ->
-        case (con2tag_Foo b)   of { b_tag ->
-        case (con2tag_Foo c)   of { c_tag ->
+      = case (dataToTag# a)   of { a_tag ->
+        case (dataToTag# b)   of { b_tag ->
+        case (dataToTag# c)   of { c_tag ->
         if (c_tag >=# a_tag) then
           c_tag <=# b_tag
         else
@@ -757,8 +751,8 @@ gen_Ix_binds loc tycon = do
     dflags <- getDynFlags
     return $ if isEnumerationTyCon tycon
       then (enum_ixes dflags, listToBag $ map DerivAuxBind
-                   [DerivCon2Tag tycon, DerivTag2Con tycon, DerivMaxTag tycon])
-      else (single_con_ixes, unitBag (DerivAuxBind (DerivCon2Tag tycon)))
+                   [DerivTag2Con tycon, DerivMaxTag tycon])
+      else (single_con_ixes, emptyBag)
   where
     --------------------------------------------------------------
     enum_ixes dflags = listToBag
@@ -1937,41 +1931,18 @@ mkCoerceClassMethEqn cls inst_tvs inst_tys rhs_ty id
 \begin{verbatim}
 data Foo ... = ...
 
-con2tag_Foo :: Foo ... -> Int#
 tag2con_Foo :: Int -> Foo ...   -- easier if Int, not Int#
 maxtag_Foo  :: Int              -- ditto (NB: not unlifted)
 \end{verbatim}
 
 The `tags' here start at zero, hence the @fIRST_TAG@ (currently one)
 fiddling around.
+
+We also use dataToTag# heavily.
 -}
 
 genAuxBindSpec :: DynFlags -> SrcSpan -> AuxBindSpec
                   -> (LHsBind GhcPs, LSig GhcPs)
-genAuxBindSpec dflags loc (DerivCon2Tag tycon)
-  = (mkFunBindSE 0 loc rdr_name eqns,
-     L loc (TypeSig noExtField [L loc rdr_name] sig_ty))
-  where
-    rdr_name = con2tag_RDR dflags tycon
-
-    sig_ty = mkLHsSigWcType $ L loc $ XHsType $ NHsCoreTy $
-             mkSpecSigmaTy (tyConTyVars tycon) (tyConStupidTheta tycon) $
-             mkParentType tycon `mkVisFunTy` intPrimTy
-
-    lots_of_constructors = tyConFamilySize tycon > 8
-                        -- was: mAX_FAMILY_SIZE_FOR_VEC_RETURNS
-                        -- but we don't do vectored returns any more.
-
-    eqns | lots_of_constructors = [get_tag_eqn]
-         | otherwise = map mk_eqn (tyConDataCons tycon)
-
-    get_tag_eqn = ([nlVarPat a_RDR], nlHsApp (nlHsVar getTag_RDR) a_Expr)
-
-    mk_eqn :: DataCon -> ([LPat GhcPs], LHsExpr GhcPs)
-    mk_eqn con = ([nlWildConPat con],
-                  nlHsLit (HsIntPrim NoSourceText
-                                    (toInteger ((dataConTag con) - fIRST_TAG))))
-
 genAuxBindSpec dflags loc (DerivTag2Con tycon)
   = (mkFunBindSE 0 loc rdr_name
         [([nlConVarPat intDataCon_RDR [a_RDR]],
@@ -2254,14 +2225,26 @@ eq_Expr ty a b
  where
    (_, _, prim_eq, _, _) = primOrdOps "Eq" ty
 
-untag_Expr :: DynFlags -> TyCon -> [( RdrName,  RdrName)]
-              -> LHsExpr GhcPs -> LHsExpr GhcPs
+-- | Take an expression and a list of pairs @(exprName1,tagName1)@.
+-- Wraps the given expression in cases which bind tagName1 to the
+-- tag of exprName1 and so forth for all pairs and returns the
+-- resulting expression.
+untag_Expr :: DynFlags
+           -> TyCon
+           -> [( RdrName,  RdrName)] -- (expr, expr's tag bound to this)
+           -> LHsExpr GhcPs -- Final RHS
+           -> LHsExpr GhcPs -- Result expr
 untag_Expr _ _ [] expr = expr
 untag_Expr dflags tycon ((untag_this, put_tag_here) : more) expr
-  = nlHsCase (nlHsPar (nlHsVarApps (con2tag_RDR dflags tycon)
-                                   [untag_this])) {-of-}
+  {- case (dataToTag# untag_this) of
+        put_tag_here -> .... <recursive on more>
+          _ -> result
+         -}
+  = nlHsCase (nlHsPar (nlHsApp (nlHsVar dataToTag_RDR) (nlHsVar untag_this))) {-of-}
       [mkHsCaseAlt (nlVarPat put_tag_here) (untag_Expr dflags tycon more expr)]
 
+
+
 enum_from_to_Expr
         :: LHsExpr GhcPs -> LHsExpr GhcPs
         -> LHsExpr GhcPs
@@ -2372,9 +2355,8 @@ minusInt_RDR, tagToEnum_RDR :: RdrName
 minusInt_RDR  = getRdrName (primOpId IntSubOp   )
 tagToEnum_RDR = getRdrName (primOpId TagToEnumOp)
 
-con2tag_RDR, tag2con_RDR, maxtag_RDR :: DynFlags -> TyCon -> RdrName
+tag2con_RDR, maxtag_RDR :: DynFlags -> TyCon -> RdrName
 -- Generates Orig s RdrName, for the binding positions
-con2tag_RDR dflags tycon = mk_tc_deriv_name dflags tycon mkCon2TagOcc
 tag2con_RDR dflags tycon = mk_tc_deriv_name dflags tycon mkTag2ConOcc
 maxtag_RDR  dflags tycon = mk_tc_deriv_name dflags tycon mkMaxTagOcc
 
@@ -2403,13 +2385,15 @@ mkAuxBinderName dflags parent occ_fun
 {-
 Note [Auxiliary binders]
 ~~~~~~~~~~~~~~~~~~~~~~~~
-We often want to make a top-level auxiliary binding.  E.g. for comparison we have
 
-  instance Ord T where
-    compare a b = $con2tag a `compare` $con2tag b
+We often want to make a top-level auxiliary binding.  E.g. for enum we
+turn a Integer into a constructor. So we have
+
+  instance Enum T where
+    succ x = $tag2con (dataToTag x + 1)
 
-  $con2tag :: T -> Int
-  $con2tag = ...code....
+  $tag2con :: Int -> T
+  $tag2con = ...code....
 
 Of course these top-level bindings should all have distinct name, and we are
 generating RdrNames here.  We can't just use the TyCon or DataCon to distinguish


=====================================
compiler/utils/ListSetOps.hs
=====================================
@@ -104,12 +104,13 @@ minusList xs ys = filter (`S.notMember` yss) xs
 Inefficient finite maps based on association lists and equality.
 -}
 
--- A finite mapping based on equality and association lists
+-- | A finite mapping based on equality and association lists.
 type Assoc a b = [(a,b)]
 
 assoc             :: (Eq a) => String -> Assoc a b -> a -> b
 assocDefault      :: (Eq a) => b -> Assoc a b -> a -> b
 assocUsing        :: (a -> a -> Bool) -> String -> Assoc a b -> a -> b
+-- | Lookup key, fail gracefully using Nothing if not found.
 assocMaybe        :: (Eq a) => Assoc a b -> a -> Maybe b
 assocDefaultUsing :: (a -> a -> Bool) -> b -> Assoc a b -> a -> b
 



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/compare/d39b8d9f72b254d042fb16c00e78e23b35a6e879...4a2cf8627cc5fee9d0fc2423d598025a0d3bbb5f

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/compare/d39b8d9f72b254d042fb16c00e78e23b35a6e879...4a2cf8627cc5fee9d0fc2423d598025a0d3bbb5f
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200318/3b57123b/attachment-0001.html>


More information about the ghc-commits mailing list