Deriving Data for poly-kinded datatypes

Ryan Scott ryan.gl.scott at gmail.com
Fri Feb 24 22:24:50 UTC 2017


I pushed some more on this and managed to get something which "works".
That is, I was able to declare a Data instance for T:

    data T (phantom :: k) = T
    instance (Typeable k, Typeable phantom, Possibly Data phantom) =>
Data (T (phantom :: k)) where
      ...
      dataCast1 = dataCast1T

(More in a second about what `dataCast1T` and `Possibly` are).

Now I was able to use this ext1Q combinator:

    newtype Q q x = Q { unQ :: x -> q }
    ext1Q :: (Data d, Typeable t)
          => (d -> q)
          -> (forall e. Data e => t e -> q)
          -> d -> q
    ext1Q def ext arg =
      case dataCast1 (Q ext) of
        Just (Q ext') -> ext' arg
        Nothing -> def arg

on different values of T without restricting the parameter to kind *:

    test1, test2, test3 :: Char
    test1 = (const 'p') `ext1Q` (\T -> 'q') $ (T :: T ())
    test2 = (const 'p') `ext1Q` (\T -> 'q') $ (T :: T Char)
    test3 = (const 'p') `ext1Q` (\T -> 'q') $ (T :: T
(NotADataInstance :: Type -> Type))

    main :: IO ()
    main = putStrLn [test1, test2, test3]

And if you run `main`, you get "qqp" as the output, as you'd expect!
That's because () and Char are Data Instances, so ext1Q picks the
second function to run, whereas NotADataInstance isn't, so the first
function is ran. The full source code is at [1].

So how does it work? Well, it's not pretty. First, we draw upon an
extremely specialized dictionary datatype:

    data D (p :: k -> Constraint) (x :: j) where
      D :: forall (p :: k -> Constraint) (x :: k). p x => D p x

Notice how the D constructor forces the kind of x and the kind of p's
first argument to be the same--this is crucial for the trick to work,
since we need evidence that the kinds are the same when we
pattern-match on D. Now we define this class:

    class Possibly p x where
      possibly :: proxy1 p -> proxy2 x -> Maybe (D p x)

This is like DataIfStar, but more general. We can define a
fall-through instance for Data like so:

    instance {-# OVERLAPPABLE #-} Possibly Data (x :: j) where
      possibly _ _ = Nothing

I wasn't able to define a single overlapping instance for all things
of kind *, but I did have some success defining instances individually
for each Data instance:

    instance {-# OVERLAPPING #-} Possibly Data () where
      possibly _ _ = Just D
    instance {-# OVERLAPPING #-} Possibly Data Char where
      possibly _ _ = Just D

Finally, we're able to define a version of dataCast1 for T that
doesn't overly constrain its type parameter:

    dataCast1T :: forall k c t (phantom :: k).
                  (Typeable t, Possibly Data phantom)
               => (forall d. Data d => c (t d))
               -> Maybe (c (T phantom))
    dataCast1T f = case possibly (Proxy :: Proxy Data) (Proxy :: Proxy
phantom) of
                     Nothing -> Nothing
                     Just D  -> gcast1 f

Miraculously, this typechecks. We use `possibly` to check two things:

1. Is (phantom :: *)?
2. Furthermore, is there a (Data phantom) instance in scope?

If we successfully pattern-match on the resulting dictionary, then GHC
has evidence for these two things in scope, so we are able to call
gcast1 successfully.

Now that I'm done explaining this, let me make it clear that this is
NOT at all what I think the proper solution should look like, since
`Possibly` and `D` feel way too hacky. I'll need to think a bit about
how we can take some ideas from this sketch and turn them into
sensible, reusable abstractions.

Ryan S.
-----
[1] https://gist.github.com/RyanGlScott/9ec8665e7feefa6aa870b5ac85ae3663

On Fri, Feb 24, 2017 at 11:13 AM, Ryan Scott <ryan.gl.scott at gmail.com> wrote:
> Yikes, I was afraid of reaching for something like DataIfStar. But I
> don't have any other ideas, so let's play along with that example.
>
> Frustratingly, I can't even write the class definition of DataIfStar.
> If I do, GHC yells at me like so:
>
>     • Expected a type, but ‘k’ has kind ‘t’
>     • In the first argument of ‘Data’, namely ‘k’
>       In the type signature:
>         dataIf :: (t ~ Type) => proxy k -> (Data k => r) -> r -> r
>       In the class declaration for ‘DataIfStar’
>
> In other words, GHC isn't smart enough to see that in the (Data k =>
> r) argument, k's kind t should be the same as *. This is pretty
> annoying, because this overlapping instances trick only works due to
> the fact that their instance heads distinguish between kinds. In other
> words, if you were to change the definition of DataIfStar to this:
>
>     class Typeable k => DataIfStar (k :: *) where
>       dataIf :: proxy k -> (Data k => r) -> r -> r
>
> Then although the class declaration would now typecheck, the two
> instances below it would be flagged as duplicate instances because
> they now have the same instance head! Ugh.
>
> Somehow, I need to be able to defer an equality constraint at the type
> level, but just writing (t ~ Type) => ... doesn't quite work,
> unfortunately.
>
> Ryan S.
>
> On Fri, Feb 24, 2017 at 12:14 AM, Edward Kmett <ekmett at gmail.com> wrote:
>> That is a right mess.
>>
>> I've now stepped a bit outside of what I think is a practical
>> recommendation, but let's just keep playing for the fun of it.
>>
>> This is going to be hard to do without access to GHC at the moment but here
>> goes:
>>
>> If phantom was of kind * then dataCast1 would need the (Data phantom), so we
>> need some sort of way to supply it.
>>
>> Looks like we'd need some 'Data if *'. We don't have exponentials in the
>> type checker though. They are admissable, just not a thing GHC does today.
>> We could fake it with some nastiness perhaps:
>>
>> class (Typeable t, Typeable k) => DataIfStar (k :: t) where
>>   dataIf :: (t ~ Type) => proxy k -> (Data k => r) -> r
>>
>> You might be able to make a couple of overlapping instances, as much as it
>> pains me to consider. (Here be dragons, I haven't thought this through.
>>
>> instance Data k => DataIfStar k where -- overlapping
>>   dataIf _ r = r
>>
>> instance DataIfStar k where -- overlapped
>>   dataIf _ _ = undefined
>>
>> Then after you check k ~ Type in the instance you could invoke dataIf using
>> the knowledge that k ~ Type to pull the Data phantom instance into scope.
>> You'd get a type signature like:
>>
>> dataCast1T :: forall k c t (phantom :: k).
>>                   (Typeable t, DataIfStar phantom)
>>                => (forall d. Data d => c (t d))
>>                -> Maybe (c (T phantom))
>>
>> which would mean the Data instance for T phantom would get a constraint
>> like:
>>
>> instance DataIfStar phantom => Data (T phantom)
>>
>> Does that make sense?
>>
>> -Edward
>>
>>
>>
>>
>> On Thu, Feb 23, 2017 at 10:30 PM, Ryan Scott <ryan.gl.scott at gmail.com>
>> wrote:
>>>
>>> > Supplying the default shouldn't lock our data instance to the form T a.
>>> > If for some reason adding this default would break the instance We can make
>>> > a more interesting default that does something like look at the kind of the
>>> > argument first to determine if it is kind * before proceeding after casting
>>> > to ensure the kinds match.
>>>
>>> Interesting. Do you happen to know how to write this "more interesting
>>> default"? I've tried various things, but sadly I can't escape past the
>>> typechecker. My attempt that made it the furthest was this:
>>>
>>>     {-# LANGUAGE GADTs #-}
>>>     {-# LANGUAGE PolyKinds #-}
>>>     {-# LANGUAGE RankNTypes #-}
>>>     {-# LANGUAGE ScopedTypeVariables #-}
>>>     {-# LANGUAGE TypeInType #-}
>>>     {-# LANGUAGE TypeOperators #-}
>>>     module DataCast where
>>>
>>>     import Data.Data
>>>     import Data.Kind (Type)
>>>
>>>     data T (phantom :: k) = T
>>>
>>>     dataCast1T :: forall k c t (phantom :: k).
>>>                   (Typeable k, Typeable t, Typeable phantom)
>>>                => (forall d. Data d => c (t d))
>>>                -> Maybe (c (T phantom))
>>>     dataCast1T f = case eqT :: Maybe (k :~: Type) of
>>>                      Nothing   -> Nothing
>>>                      Just Refl -> gcast1 f
>>>
>>> This would work were it not for the constraints involved:
>>>
>>>     Could not deduce (Data phantom) arising from a use of ‘f’
>>>
>>> It seems that applying f is forcing the type parameter to T (phantom)
>>> to be a Data instance, which obviously can't happen if (phantom :: k).
>>> I don't know a way around this, though, as I'm not aware of a way to
>>> "defer" a class constraint (unlike equality constraints, which can be
>>> deferred via Typeable).
>>>
>>> Ryan S.
>>>
>>> On Thu, Feb 23, 2017 at 5:49 PM, Edward Kmett <ekmett at gmail.com> wrote:
>>> > Some thoughts on the topic: admittedly, probably not very useful.
>>> >
>>> > A couple of obvious statements:
>>> >
>>> >
>>> >
>>> > 1)
>>> >
>>> > gcast1 itself operationally makes sense regardless of the kind of the
>>> > argument you're skipping past.
>>> >
>>> > gcast1 :: forall c t t' a. (Typeable t, Typeable t') => c (t a) -> Maybe
>>> > (c
>>> > (t' a))
>>> >
>>> > This is an operation we can define pretty easily, however we want. It
>>> > works
>>> > more or less by definition. It was included in the original paper.
>>> >
>>> >
>>> >
>>> > 2)
>>> >
>>> > dataCast1 on the other hand is an member of the class, and it needs that
>>> > 'Data d' constraint or it can't do its job.
>>> >
>>> > dataCast1 :: (Data a, Typeable t) => (forall d. Data d => c (t d)) ->
>>> > Maybe
>>> > (c a)
>>> >
>>> > Attempting to weaken the Data d constraint there doesn't work, because
>>> > then
>>> > dataCast1 wouldn't be able to do its job. Without it you can't write
>>> > ext1Q
>>> > in terms of dataCast1, as noted in the but about 'bogusDataCast' in the
>>> > original paper: Scrap More Boilerplate: Reflection, Zips and Generalized
>>> > Casts
>>> >
>>> >
>>> >
>>> >
>>> > Since then we obviously picked up polymorphic kinds, which muddled the
>>> > story
>>> > about when a data instance is for a type that looks like `T d`.
>>> >
>>> > On the other hand, the user will be the one calling this method, and
>>> > they'll
>>> > have to take Data instances for d and use them to generate c (t d).
>>> > Knowledge of if argument is of kind * is purely local, though. If t is
>>> > Typeable and d is Data, then (t d), d is kind *, t :: * -> *. So to call
>>> > dataCast1, the user has to already know the argument is kind *, and that
>>> > the
>>> > type t that they are trying to use (maybe not the T in the data instance
>>> > itself) is of kind * -> *.
>>> >
>>> > There is no contradiction in trying to call this on a data type with the
>>> > wrong kind for it to succeed. T is not necessarily t. It is our job to
>>> > check
>>> > if it is.
>>> >
>>> > Supplying the default shouldn't lock our data instance to the form T a.
>>> > If
>>> > for some reason adding this default would break the instance We can make
>>> > a
>>> > more interesting default that does something like look at the kind of
>>> > the
>>> > argument first to determine if it is kind * before proceeding after
>>> > casting
>>> > to ensure the kinds match.
>>> >
>>> > tl;dr yes, it would seem it would be worth fixing the instances of Data
>>> > produced to supply dataCast1 when the kind of the argument is
>>> > polymorphic.
>>> > Otherwise turning on PolyKinds in a package will simply break dataCast1
>>> > for
>>> > basically all of its data types.
>>> >
>>> > -Edward
>>> >
>>> > P.S. Ultimately, in a perfect world we'd be able to unify dataCast1 and
>>> > dataCast2 with some tricky base case for kind k1 = * and induction over
>>> > k ->
>>> > k1. Off hand, I don't see how to do it. I played for a while with trying
>>> > to
>>> > write a higher kinded Data to support this, but my scribbles didn't
>>> > cohere
>>> > into usable code. gfoldl pretty much locks you into kind *. I even tried
>>> > playing with profunctors and powers here to no real avail. I do have
>>> > some
>>> > old Data1 code in an syb-extras package that I use to write a few
>>> > 'impossible' Data instances, but that seems to be solving a different,
>>> > if
>>> > related, problem, and doesn't scale up to arbitrary kinds. Ideally
>>> > there'd
>>> > be a plausible Data that makes sense for other kinds k, and everything
>>> > could
>>> > become dataCast1, w/ dataCast2, and the missing higher versions just an
>>> > iterated application of it. I just don't see that there is a way to turn
>>> > it
>>> > into code.
>>> >
>>> >
>>> > On Thu, Feb 23, 2017 at 2:51 PM, Ryan Scott <ryan.gl.scott at gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi Pedro,
>>> >>
>>> >> I'm quite confused by a peculiarity of deriving Data (more info in
>>> >> Trac #13327 [1]). In particular, if you write this:
>>> >>
>>> >>     data T phantom = T
>>> >>       deriving Data
>>> >>
>>> >> Then the derived Data instance is NOT this:
>>> >>
>>> >>     instance Typeable phantom => Data (T phantom) where
>>> >>       ...
>>> >>
>>> >> But instead, it's this:
>>> >>
>>> >>     instance Data phantom => Data (T phantom) where
>>> >>       ...
>>> >>       dataCast1 f = gcast1 f
>>> >>
>>> >> The gcast1 part is why it requires the stronger (Data phantom)
>>> >> context, as you noted in Trac #4028 [2].
>>> >>
>>> >> What confuses me, however, is that is apparently does not carry over
>>> >> to poly-kinded datatypes. For instance, if you write this:
>>> >>
>>> >>     data T (phantom :: k) = T
>>> >>       deriving Data
>>> >>
>>> >> Then you do NOT get this instance:
>>> >>
>>> >>     instance Data (phantom :: *) => Data (T phantom) where
>>> >>       ...
>>> >>       dataCast1 f = gcast1 f
>>> >>
>>> >> But instead, you get this instance!
>>> >>
>>> >>     instance (Typeable k, Typeable (phantom :: k)) => Data (T phantom)
>>> >> where
>>> >>       ...
>>> >>       -- No implementation for dataCast1
>>> >>
>>> >> This is quite surprising to me. I'm not knowledgeable enough about
>>> >> Data to know for sure if this is an oversight, expected behavior, or
>>> >> something else, so I was hoping you (or someone else highly
>>> >> knowledgeable about SYB-style generic programming) could help me out
>>> >> here.
>>> >>
>>> >> In particular:
>>> >>
>>> >> 1. Does emitting "dataCast1 f = gcast1 f" for datatypes of kind (k ->
>>> >> *) make sense? Or does it only make sense for types of kind (* -> *)?
>>> >> 2. Is there an alternate way to define dataCast1 that doesn't require
>>> >> the stronger Data context, but instead only requires the more general
>>> >> Typeable context?
>>> >>
>>> >> Ryan S.
>>> >> -----
>>> >> [1] https://ghc.haskell.org/trac/ghc/ticket/13327
>>> >> [2] https://ghc.haskell.org/trac/ghc/ticket/4028#comment:5
>>> >> _______________________________________________
>>> >> ghc-devs mailing list
>>> >> ghc-devs at haskell.org
>>> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>> >
>>> >
>>
>>


More information about the ghc-devs mailing list