[Git][ghc/ghc][wip/T18021] Reject dodgy scoping in associated family instance RHSes

Ryan Scott gitlab at gitlab.haskell.org
Mon Dec 7 11:21:23 UTC 2020



Ryan Scott pushed to branch wip/T18021 at Glasgow Haskell Compiler / GHC


Commits:
96ad03e1 by Ryan Scott at 2020-12-07T06:21:10-05:00
Reject dodgy scoping in associated family instance RHSes

Commit e63518f5d6a93be111f9108c0990a1162f88d615 tried to push all of the logic
of detecting out-of-scope type variables on the RHSes of associated type family
instances to `GHC.Tc.Validity` by deleting a similar check in the renamer.
Unfortunately, this commit went a little too far, as there are some corner
cases that `GHC.Tc.Validity` doesn't detect. Consider this example:

```hs
class C a where
  data D a

instance forall a. C Int where
  data instance D Int = MkD a
```

If this program isn't rejected by the time it reaches the typechecker, then
GHC will believe the `a` in `MkD a` is existentially quantified and accept it.
This is almost surely not what the user wants! The simplest way to reject
programs like this is to restore the old validity check in the renamer
(search for `improperly_scoped` in `rnFamEqn`).

Note that this is technically a breaking change, since the program in the
`polykinds/T9574` test case (which previously compiled) will now be rejected:

```hs
instance Funct ('KProxy :: KProxy o) where
    type Codomain 'KProxy = NatTr (Proxy :: o -> *)
```

This is because the `o` on the RHS will now be rejected for being out of scope.
Luckily, this is simple to repair:

```hs
instance Funct ('KProxy :: KProxy o) where
    type Codomain ('KProxy @o) = NatTr (Proxy :: o -> *)
```

All of the discussion is now a part of the revamped
`Note [Renaming associated types]` in `GHC.Rename.Module`.

A different design would be to make associated type family instances have
completely separate scoping from the parent instance declaration, much like
how associated type family default declarations work today. See the discussion
beginning at https://gitlab.haskell.org/ghc/ghc/-/issues/18021#note_265729 for
more on this point. This, however, would break even more programs that are
accepted today and likely warrants a GHC proposal before going forward. In the
meantime, this patch fixes the issue described in #18021 in the least invasive
way possible. There are programs that are accepted today that will no longer
be accepted after this patch, but they are arguably pathological programs, and
they are simple to repair.

Fixes #18021.

- - - - -


8 changed files:

- compiler/GHC/Rename/Module.hs
- docs/users_guide/9.2.1-notes.rst
- testsuite/tests/indexed-types/should_fail/T5515.stderr
- + testsuite/tests/polykinds/T9574.stderr
- testsuite/tests/polykinds/all.T
- + testsuite/tests/rename/should_fail/T18021.hs
- + testsuite/tests/rename/should_fail/T18021.stderr
- testsuite/tests/rename/should_fail/all.T


Changes:

=====================================
compiler/GHC/Rename/Module.hs
=====================================
@@ -725,10 +725,24 @@ rnFamEqn doc atfi rhs_kvars
              --     of the instance decl.  See
              --     Note [Unused type variables in family instances]
        ; let nms_used = extendNameSetList rhs_fvs $
-                           inst_tvs ++ nms_dups
+                           nms_dups {- (a) -} ++ inst_head_tvs {- (b) -}
              all_nms = hsOuterTyVarNames rn_outer_bndrs'
        ; warnUnusedTypePatterns all_nms nms_used
 
+         -- For associated family instances, if a type variable from the
+         -- parent instance declaration is mentioned on the RHS of the
+         -- associated family instance but not bound on the LHS, then reject
+         -- that type variable as being out of scope.
+         -- See Note [Renaming associated types]
+       ; let lhs_bound_vars = extendNameSetList pat_fvs all_nms
+             improperly_scoped cls_tkv =
+                  cls_tkv `elemNameSet` rhs_fvs
+                    -- Mentioned on the RHS...
+               && not (cls_tkv `elemNameSet` lhs_bound_vars)
+                    -- ...but not bound on the LHS.
+             bad_tvs = filter improperly_scoped inst_head_tvs
+       ; unless (null bad_tvs) (badAssocRhs bad_tvs)
+
        ; let eqn_fvs = rhs_fvs `plusFV` pat_fvs
              -- See Note [Type family equations and occurrences]
              all_fvs = case atfi of
@@ -754,10 +768,10 @@ rnFamEqn doc atfi rhs_kvars
 
     -- The type variables from the instance head, if we are dealing with an
     -- associated type family instance.
-    inst_tvs = case atfi of
-      NonAssocTyFamEqn _        -> []
-      AssocTyFamDeflt _         -> []
-      AssocTyFamInst _ inst_tvs -> inst_tvs
+    inst_head_tvs = case atfi of
+      NonAssocTyFamEqn _             -> []
+      AssocTyFamDeflt _              -> []
+      AssocTyFamInst _ inst_head_tvs -> inst_head_tvs
 
     pat_kity_vars_with_dups = extractHsTyArgRdrKiTyVars pats
              -- It is crucial that extractHsTyArgRdrKiTyVars return
@@ -774,6 +788,13 @@ rnFamEqn doc atfi rhs_kvars
       [loc]      -> loc
       (loc:locs) -> loc `combineSrcSpans` last locs
 
+    badAssocRhs :: [Name] -> RnM ()
+    badAssocRhs ns
+      = addErr (hang (text "The RHS of an associated type declaration mentions"
+                      <+> text "out-of-scope variable" <> plural ns
+                      <+> pprWithCommas (quotes . ppr) ns)
+                   2 (text "All such variables must be bound on the LHS"))
+
 rnTyFamInstDecl :: AssocTyFamInfo
                 -> TyFamInstDecl GhcPs
                 -> RnM (TyFamInstDecl GhcRn, FreeVars)
@@ -927,58 +948,155 @@ Relevant tickets: #3699, #10586, #10982 and #11451.
 
 Note [Renaming associated types]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Check that the RHS of the decl mentions only type variables that are explicitly
-bound on the LHS.  For example, this is not ok
-   class C a b where
-      type F a x :: *
+When renaming an associated type/data family instance, we must check that the
+RHS of the declaration mentions only type variables that are bound on the LHS.
+Here is a simple example of something we should reject:
+
+  class C a b where
+    type family F a x
+  instance C Int Bool where
+    type instance F Int x = z
+
+Here, `z` is mentioned on the RHS of the associated instance without being
+bound anywhere on the LHS. GHC will reject `z` as being out of scope without
+much fuss.
+
+Things get slightly trickier when the instance header itself binds type
+variables. Consider this example (adapted from #5515):
+
    instance C (p,q) r where
-      type F (p,q) x = (x, r)   -- BAD: mentions 'r'
-c.f. #5515
-
-Kind variables, on the other hand, are allowed to be implicitly or explicitly
-bound. As examples, this (#9574) is acceptable:
-   class Funct f where
-      type Codomain f :: *
-   instance Funct ('KProxy :: KProxy o) where
-      -- o is implicitly bound by the kind signature
-      -- of the LHS type pattern ('KProxy)
-      type Codomain 'KProxy = NatTr (Proxy :: o -> *)
-And this (#14131) is also acceptable:
-    data family Nat :: k -> k -> *
-    -- k is implicitly bound by an invisible kind pattern
-    newtype instance Nat :: (k -> *) -> (k -> *) -> * where
-      Nat :: (forall xx. f xx -> g xx) -> Nat f g
-We could choose to disallow this, but then associated type families would not
-be able to be as expressive as top-level type synonyms. For example, this type
-synonym definition is allowed:
-    type T = (Nothing :: Maybe a)
-So for parity with type synonyms, we also allow:
-    type family   T :: Maybe a
-    type instance T = (Nothing :: Maybe a)
-
-All this applies only for *instance* declarations.  In *class*
-declarations there is no RHS to worry about, and the class variables
-can all be in scope (#5862):
+      type instance F (p,q) x = (x, r)
+
+If we look at the instance in isolation:
+
+   type instance F (p,q) x = (x, r)
+
+Then it is evident that this is malformed, as `r` has no binding site. In the
+context of the original program, however, `r` is technically in scope, as the
+type variables from an instance header always scope over the associated type
+family instances. So while `r` is technically in scope, it is nevertheless
+extremely dodgy. See #18021 for an example of some of the bizarre behaviour
+that results from this dodginess.
+
+To prevent these sorts of shenanigans, we reject programs like the one above
+with an extra validity check in rnFamEqn. For each type variable bound in the
+parent instance head, we check if it is mentioned on the RHS of the associated
+family instance but not bound on the LHS. If any of the instance-head-bound
+variables meet these criteria, we throw an error.
+(See rnFamEqn.improperly_scoped for how this is implemented.)
+
+Some additional wrinkles:
+
+* Because of the "stop-gap" Note [Implicit quantification in type synonyms] in
+  GHC.Rename.HsType, a variable does not have to be mentioned by name in the
+  LHS in order for it to be bound on the LHS. For example, GHC currently
+  accepts this:
+
+    class C2 a where
+      type F2 :: Maybe a
+    instance C2 a where
+      type F2 = (Nothing :: Maybe a)
+
+  Here, the kind variable `a` in the RHS of the F2 instance is implicitly
+  quantified by virtue of being mentioned in an outermost kind signature.
+  (See Note [Implicit quantification in type synonyms] for more on this point.)
+  One could equivalently write the instance like so:
+
+    instance C2 a where
+      type F2 @a = (Nothing :: Maybe a)
+
+  The explicit result kinds in data family instances, which are also considered
+  to be part of the RHS, provide another example. This program (adapted
+  from #14131) is also accepted:
+
+    class C3 k where
+      data Nat :: k -> k -> Type
+    instance C3 k where
+      newtype Nat :: (k -> Type) -> (k -> Type) -> Type where
+        Nat :: (forall xx. f xx -> g xx) -> Nat f g
+      -- Alternatively,
+      -- newtype Nat @k :: (k -> Type) -> (k -> Type) -> Type where ...
+
+* This Note only applies to *instance* declarations.  In *class* declarations
+  there is no RHS to worry about, and the class variables can all be in scope
+  (#5862):
+
     class Category (x :: k -> k -> *) where
       type Ob x :: k -> Constraint
       id :: Ob x a => x a a
       (.) :: (Ob x a, Ob x b, Ob x c) => x b c -> x a b -> x a c
-Here 'k' is in scope in the kind signature, just like 'x'.
 
-Although type family equations can bind type variables with explicit foralls,
-it need not be the case that all variables that appear on the RHS must be bound
-by a forall. For instance, the following is acceptable:
+  Here 'k' is in scope in the kind signature, just like 'x'.
+
+* Although type family equations can bind type variables with explicit foralls,
+  it need not be the case that all variables that appear on the RHS must be
+  bound by a forall. For instance, the following is acceptable:
+
+    class C4 a where
+      type T4 a b
+    instance C4 (Maybe a) where
+      type forall b. T4 (Maybe a) b = Either a b
+
+  Even though `a` is not bound by the forall, this is still accepted because `a`
+  was previously bound by the `instance C4 (Maybe a)` part. (see #16116).
+
+* In addition to the validity check in rnFamEqn.improperly_scoped, there is an
+  additional check in GHC.Tc.Validity.checkFamPatBinders that checks each family
+  instance equation for type variables used on the RHS but not bound on the
+  LHS. This is not made redundant by rmFamEqn.improperly_scoped, as there are
+  programs that each check will reject that the other check will not catch:
+
+  - checkValidFamPats is used on all forms of family instances, whereas
+    rmFamEqn.improperly_scoped only checks associated family instances. Since
+    checkFamPatBinders occurs after typechecking, it can catch programs that
+    introduce dodgy scoping by way of type synonyms (see #7536), which is
+    impractical to accomplish in the renamer.
+  - rnFamEqn.improperly_scoped catches some programs that, if allowed to escape
+    the renamer, would accidentally be accepted by the typechecker. Here is one
+    such program (#18021):
+
+      class C5 a where
+        data family D a
+
+      instance forall a. C5 Int where
+        data instance D Int = MkD a
+
+    If this is not rejected in the renamer, the typechecker would treat this
+    program as though the `a` were existentially quantified, like so:
+
+      data instance D Int = forall a. MkD a
+
+    This is likely not what the user intended!
+
+    Here is another such program (#9574):
+
+      class Funct f where
+        type Codomain f
+      instance Funct ('KProxy :: KProxy o) where
+        type Codomain 'KProxy = NatTr (Proxy :: o -> Type)
+
+    Where:
+
+      data Proxy (a :: k) = Proxy
+      data KProxy (t :: Type) = KProxy
+      data NatTr (c :: o -> Type)
 
-   class C a where
-     type T a b
-   instance C (Maybe a) where
-     type forall b. T (Maybe a) b = Either a b
+    Note that the `o` in the `Codomain 'KProxy` instance should be considered
+    improperly scoped. It does not meet the criteria for being explicitly
+    quantified, as it is not mentioned by name on the LHS, nor does it meet the
+    criteria for being implicitly quantified, as it is used in a RHS kind
+    signature that is not outermost (see Note [Implicit quantification in type
+    synonyms]). However, `o` /is/ bound by the instance header, so if this
+    program is not rejected by the renamer, the typechecker would treat it as
+    though you had written this:
 
-Even though `a` is not bound by the forall, this is still accepted because `a`
-was previously bound by the `instance C (Maybe a)` part. (see #16116).
+      instance Funct ('KProxy :: KProxy o) where
+        type Codomain ('KProxy @o) = NatTr (Proxy :: o -> Type)
 
-In each case, the function which detects improperly bound variables on the RHS
-is GHC.Tc.Validity.checkValidFamPats.
+    Although this is a valid program, it's probably a stretch too far to turn
+    `type Codomain 'KProxy = ...` into `type Codomain ('KProxy @o) = ...` here.
+    If the user really wants the latter, it is simple enough to communicate
+    their intent by mentioning `o` on the LHS by name.
 
 Note [Type family equations and occurrences]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


=====================================
docs/users_guide/9.2.1-notes.rst
=====================================
@@ -14,6 +14,35 @@ Language
   (Serrano et al, ICFP 2020).  More information here: :ref:`impredicative-polymorphism`.
   This replaces the old (undefined, flaky) behaviour of the :extension:`ImpredicativeTypes` extension.
 
+* GHC is stricter about checking for out-of-scope type variables on the
+  right-hand sides of associated type family instances that are not bound on
+  the left-hand side. As a result, some programs that were accidentally
+  accepted in previous versions of GHC will now be rejected, such as this
+  example: ::
+
+      class Funct f where
+        type Codomain f
+      instance Funct ('KProxy :: KProxy o) where
+        type Codomain 'KProxy = NatTr (Proxy :: o -> Type)
+
+  Where: ::
+
+      data Proxy (a :: k) = Proxy
+      data KProxy (t :: Type) = KProxy
+      data NatTr (c :: o -> Type)
+
+  GHC will now reject the ``o`` on the right-hand side of the ``Codomain``
+  instance as being out of scope, as it does not meet the requirements for
+  being explicitly bound (as it is not mentioned on the left-hand side) nor
+  implicitly bound (as it is not mentioned in an *outermost* kind signature,
+  as required by :ref:`scoping-class-params`). This program can be repaired in
+  a backwards-compatible way by mentioning ``o`` on the left-hand side: ::
+
+      instance Funct ('KProxy :: KProxy o) where
+        type Codomain ('KProxy @o) = NatTr (Proxy :: o -> Type)
+        -- Alternatively,
+        -- type Codomain ('KProxy :: KProxy o) = NatTr (Proxy :: o -> Type)
+
 Compiler
 ~~~~~~~~
 


=====================================
testsuite/tests/indexed-types/should_fail/T5515.stderr
=====================================
@@ -1,24 +1,8 @@
 
-T5515.hs:6:16: error:
-    • Expecting one more argument to ‘ctx’
-      Expected a type, but ‘ctx’ has kind ‘* -> Constraint’
-    • In the first argument of ‘Arg’, namely ‘ctx’
-      In the first argument of ‘ctx’, namely ‘(Arg ctx)’
-      In the class declaration for ‘Bome’
+T5515.hs:9:3: error:
+    The RHS of an associated type declaration mentions out-of-scope variable ‘a’
+      All such variables must be bound on the LHS
 
-T5515.hs:14:1: error:
-    • Type variable ‘a’ is mentioned in the RHS,
-        but not bound on the LHS of the family instance
-    • In the type instance declaration for ‘Arg’
-      In the instance declaration for ‘Some f’
-
-T5515.hs:14:10: error:
-    • Could not deduce (C f a0)
-      from the context: C f a
-        bound by an instance declaration:
-                   forall f a. C f a => Some f
-        at T5515.hs:14:10-24
-      The type variable ‘a0’ is ambiguous
-    • In the ambiguity check for an instance declaration
-      To defer the ambiguity check to use sites, enable AllowAmbiguousTypes
-      In the instance declaration for ‘Some f’
+T5515.hs:15:3: error:
+    The RHS of an associated type declaration mentions out-of-scope variable ‘a’
+      All such variables must be bound on the LHS


=====================================
testsuite/tests/polykinds/T9574.stderr
=====================================
@@ -0,0 +1,4 @@
+
+T9574.hs:13:5: error:
+    The RHS of an associated type declaration mentions out-of-scope variable ‘o’
+      All such variables must be bound on the LHS


=====================================
testsuite/tests/polykinds/all.T
=====================================
@@ -107,7 +107,7 @@ test('T9725', normal, compile, [''])
 test('T9750', normal, compile, [''])
 test('T9569', normal, compile, [''])
 test('T9838', normal, multimod_compile, ['T9838.hs','-v0'])
-test('T9574', normal, compile, [''])
+test('T9574', normal, compile_fail, [''])
 test('T9833', normal, compile, [''])
 test('T7908', normal, compile, [''])
 test('PolyInstances', normal, compile, [''])


=====================================
testsuite/tests/rename/should_fail/T18021.hs
=====================================
@@ -0,0 +1,18 @@
+{-# LANGUAGE DeriveAnyClass #-}
+{-# LANGUAGE ExistentialQuantification #-}
+{-# LANGUAGE MultiParamTypeClasses #-}
+{-# LANGUAGE ScopedTypeVariables #-}
+{-# LANGUAGE TypeFamilies #-}
+module T18021 where
+
+class C a where
+  data D a
+
+instance forall a. C Int where
+  data instance D Int = MkD1 a
+
+class X a b
+
+instance forall a. C Bool where
+  data instance D Bool = MkD2
+    deriving (X a)


=====================================
testsuite/tests/rename/should_fail/T18021.stderr
=====================================
@@ -0,0 +1,8 @@
+
+T18021.hs:12:3: error:
+    The RHS of an associated type declaration mentions out-of-scope variable ‘a’
+      All such variables must be bound on the LHS
+
+T18021.hs:17:3: error:
+    The RHS of an associated type declaration mentions out-of-scope variable ‘a’
+      All such variables must be bound on the LHS


=====================================
testsuite/tests/rename/should_fail/all.T
=====================================
@@ -156,6 +156,7 @@ test('T16504', normal, compile_fail, [''])
 test('T14548', normal, compile_fail, [''])
 test('T16610', normal, compile_fail, [''])
 test('T17593', normal, compile_fail, [''])
+test('T18021', normal, compile_fail, [''])
 test('T18145', normal, compile_fail, [''])
 test('T18240a', normal, compile_fail, [''])
 test('T18240b', normal, compile_fail, [''])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/96ad03e1e9a417ee9201f89968a760da0fb2d683

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/96ad03e1e9a417ee9201f89968a760da0fb2d683
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20201207/a7ca7efe/attachment-0001.html>


More information about the ghc-commits mailing list