[Hs-Generics] Traversable Functor Data,or: X marks the spot

Tue Jun 24 14:45:18 EDT 2008

Dear Generics;-)

this is a repost from cvs-libraries of an experiment which you 
might find of interest, and on which I'd welcome feedback. One 
impact being that SYB can implement your GMap test (on which 
it currently defaults), among other things (see message 1 below, 
the technique probably applies to Uniplate as well?). One issue 
being that Data instances for non-algebraic types that default to
doing nothing render this technique unsafe (see message 2 below).

Btw, the announcement of the unified generics library project 
was so long ago that I had forgotten about it and about this list 
(thanks to Simon PJ for reminding me). My interest in this is 
partially from my past with HaRe, partially from wanting generic 
traversal support over GHC AST types, so I'm more interested
in generic traversal/analysis style libraries.

Claus

--------- message 1

The issue at hand: can we use Data/Typeable to do what Functor 
and Traversible do so easily, namely type-changing maps? Or should
there be support for deriving all of Data/Typeable/Functor/Traversible
over GHC's AST?

A drastically simplified example from David's Haddock 2 code
(Haddock.Interface.Rename) needs to do a kind of mapM

    renameDoc :: Monad m => HsDoc Name -> m (HsDoc DocName)

which is straightforward with Traversible, but that is not derivable
yet (David has been working on that, though), while the usual basis
of SYB, Data/Typeable, is derivable, but all SYB transformations 
are based on gfoldl, the type of which does not permit type-changing 
maps:

    gfoldl :: (Data a) 
        =>(forall a1 b. (Data a1) => c (a1 -> b) -> a1 -> c b)
        -> (forall g. g -> c g)
        -> a-> c a

One could probably go from heterogeneous Data types to a
homogeneously typed tree representation, do the map there, 
then transform back, but that would require additional overhead
and a type coercion for the backtransform. Also, it seems a pity 
that the derived code for gfoldl can handle such maps - it is just
gfoldl's type that gets in the way (and trying to generalize that
type is a whole different kind of headache..).

While boilerplate scrappers around the globe eagerly await the
release of 

    "Generalizing the type of gfold, or: to braindamage and beyond"
    (author: unknown; release date: unknown; release: unlikely;-)

I thought I'd have a go at the smaller problem of finding a way
to bypass gfoldl's type systematically in order to use its derivable
code for things like fmap and traverse. This message summarizes
how far I've got so far, and asks for comments (does this do
what it should? is it safe? assuming it can be cleaned up to do
what it should in a safe way, does this mean that deriving Data/
Typeable for GHCs AST types will be sufficient, or should we
still have Traversible as well? etc.).

First, here is an implementation of renameDoc in terms of
gfoldl and unsafeCoerce (slightly cleaned up version of what
I sent in the other thread earlier):

    data Name    = Name    String deriving (Show,Data,Typeable)
    data DocName = DocName String deriving (Show,Data,Typeable)

    renameDoc :: Monad m => HsDoc Name -> m (HsDoc DocName)
    renameDoc (DocIdentifier ids) = 
      mapM (\(Name n)->return (DocName n)) ids >>= return . DocIdentifier
    renameDoc hsDoc = n2d (gfoldl k return hsDoc)
      where k c x = ap c (mkM (d2n . renameDoc) x)

            n2d :: Monad m => m (HsDoc Name) -> m (HsDoc DocName)
            n2d = unsafeCoerce
            d2n :: Monad m => m (HsDoc DocName) -> m (HsDoc Name)
            d2n = unsafeCoerce

'DocIdentifier :: [id] -> HsDoc id' is the only constructor in HsDoc 
that involves the parameter type 'id', so renameDoc either does the
parameter type conversion or -for other constructors- recurses into 
the subexpressions, using gfoldl to build a monadic map. 

The important insight here is that gfoldl's code can handle the task,
we just pretend that our map is type-preserving to conform to gfoldl's 
restrictive type, by coercing the result types (inside gfoldl, we pretend 
that renameDoc returns a (HsDoc Name), which outside gfoldl, we 
coerce back to (HsDoc DocName)).

Assuming that noone looks at the return types inside gfoldl (at least,
not in a way that would be affected by this change in type - SYB
does support adhoc overloading, after all, as in mkM here), this
seems to work:

    testDoc = 
        DocAppend (DocParagraph (DocAppend (DocIdentifier [Name "well-typed"]) DocEmpty)) 
             (DocAppend (DocString "programs") 
             (DocIdentifier [Name "don't",Name "go",Name "anywhere"]))

    *Main> renameDoc testDoc
    DocAppend (DocParagraph (DocAppend (DocIdentifier [DocName "well-typed"]) DocEmpty))
     (DocAppend (DocString "programs") 
    (DocIdentifier [DocName "don't",DocName "go",DocName "anywhere"]))

But can we generalize this, and what about those coercions?
Can we -ideally- define something like fmap and traverse in 
terms of gfoldl, and hide the uglyness in their implementations?

Well, I'll spare you (and me;-) the details of my struggle with
the type system, and just show the results, with some comments. 
First, the simpler fmap:

    -- "X marks the spots";-) X should be private
    data X = X deriving (Data,Typeable)

    fmap' :: (Data (f X)) => (a -> b) -> f a -> f b
    fmap' f x = markTheSpots (rec (wrap f)) x
      where
      markTheSpots :: (f X -> f X) -> (f a -> f b)
      markTheSpots f = unsafeCoerce . f . unsafeCoerce
      rec ::  (Data a) => (X -> X) -> a -> a
      rec f x = (gmapT (rec f) `extT` f) x
      wrap :: (a -> b) -> (X -> X)
      wrap f = unsafeCoerce . f . unsafeCoerce

Surprisingly simple for something that seemed impossible
at first, isn't it?-) Since we're already committed (for this
experiment, at least) to some type fiddling, we can make
more constructive use of unsafeCoerce, for two purposes:

1. We wrap the function parameter f, to make it look
    like a type-preserving function, on some private type X.

2. We mark the occurrences of the type constructor f's 
    parameter type, by coercing 'f a' to 'f X' and 'f X' to
    'f b'.

Then, we simply use SYB to apply f, when its type matches
the parameter, or to recurse into the subexpressions using
gmapT, otherwise. If X is private, f will only be applied
to the "functor parameter" _positions_ in 'f a', not to other
"functor parameter" _type_ occurrences in 'f a':

    *Main> fmap' not $ (True,True)
    (True,False)
    *Main> fmap' not $ [True,True]
    [False,False]

    *Main> fmap' (\(Name s)->(DocName s)) testDoc
    DocAppend (DocParagraph (DocAppend (DocIdentifier [DocName "well-typed"]) DocEmpty))
        (DocAppend (DocString "programs") 
        (DocIdentifier [DocName "don't",DocName "go",DocName "anywhere"]))

Note how we use one kind of recursion where a manual 
implementation of fmap would use two: handling subexpressions
(which we'd usually do by pattern matching and expression
construction) and functor type recursion.

Ecouraged by this success, we are ready to tackle the 
slightly more tricky traverse, using the same techniques:
mark the spot, wrap the worker, one kind of recursion.
Only this time, we need to take care of the applicative
plumbing as well, so it's gfoldl instead of gmapT, and
some more complex types.

We need the usual SYB type extension support, but for
Applicative, not Monad (SYB was defined before
Applicative existed, it seems..):

    -- type extension over Applicative f
    mkF :: forall f a b . (Applicative f,Typeable a,Typeable b) 
                       => (b -> f b) -> a -> f a
    mkF f x = case gcast (F f) of { Just (F f) -> f x; Nothing -> pure x }

    extF :: forall t f a b . (Typeable a,Typeable b) 
         => (a -> f a) -> (b -> f b) -> (a -> f a)
    (f `extF` fspec) x = case gcast (F fspec) of { Just (F fspec) -> fspec x; Nothing -> f x }

    newtype F f x = F { unF :: x -> f x }

And here we go:

    traverse' :: forall f t a b . (Applicative f,Typeable1 f,
                                   Typeable1 t,Data (t X),
                                   Typeable a)
                               => (a -> f b) -> t a -> f (t b)
    traverse' f x = markTheSpots (rec (wrap f)) x
      where
      markTheSpots :: forall a b . (t X -> f (t X)) -> (t a -> f (t b))
      markTheSpots f = unsafeCoerce . f . unsafeCoerce
      wrap :: forall a b . (a -> f b) -> (X -> f X)
      wrap f = unsafeCoerce . f . unsafeCoerce

      rec :: forall x . Data x => (X -> f X) -> x -> f x
      rec f x = (gfoldl (k f) z `extF` f) x 
      k :: forall a b . Data a => (X -> f X) -> f (a -> b) -> a -> f b
      k f c x = c <*> (mkF (rec f :: Data a => a -> f a) `extF` f) x
      z c = pure c

This does seem to do the right thing, so I don't seem to be
completely on the wrong track:

    *Main> traverse' (pure . not) (True,True)
    (True,False)

    *Main> traverse' (pure . not) [True,True]
    [False,False]

    *Main> traverse' print testDoc
    Name "well-typed"
    Name "don't"
    Name "go"
    Name "anywhere"
    DocAppend (DocParagraph (DocAppend (DocIdentifier [()]) DocEmpty)) 
        (DocAppend (DocString "programs") (DocIdentifier [(),(),()]))

    *Main> traverse' (pure) testDoc
    DocAppend (DocParagraph (DocAppend (DocIdentifier [Name "well-typed"]) DocEmpty)) 
        (DocAppend (DocString "programs") 
        (DocIdentifier [Name "don't",Name "go",Name "anywhere"]))

    *Main> traverse' (pure . (\(Name s)->DocName s)) testDoc
    DocAppend (DocParagraph (DocAppend (DocIdentifier [DocName "well-typed"]) DocEmpty))
     (DocAppend (DocString "programs") 
     (DocIdentifier [DocName "don't",DocName "go",DocName "anywhere"]))

but I'm too battered from trying to coerce gfoldl to analyze this
properly at the moment, so I'm sending this in the hope of (a)
not having to look at gfoldl's type for a while !-) and (b) getting
some feedback (is this useful? does the extra overhead matter?..), 
caveats (cyclic programs, nested traversals, escaping Xs, ..?), etc.

Over to you,
Claus

PS. the "X marks the spot" trick reminds me of the popular
    medicine topic: delivery system plus targeted activation
    (SYB plus unsafeCoerce).

------------- message 2

>    fmap' :: (Data (f X)) => (a -> b) -> f a -> f b
>    fmap' f x = markTheSpots (rec (wrap f)) x
>      where
>      markTheSpots :: (f X -> f X) -> (f a -> f b)
>      markTheSpots f = unsafeCoerce . f . unsafeCoerce
>      rec ::  (Data a) => (X -> X) -> a -> a
>      rec f x = (gmapT (rec f) `extT` f) x
>      wrap :: (a -> b) -> (X -> X)
>      wrap f = unsafeCoerce . f . unsafeCoerce
.. 
> 1. We wrap the function parameter f, to make it look
>    like a type-preserving function, on some private type X.
> 
> 2. We mark the occurrences of the type constructor f's 
>    parameter type, by coercing 'f a' to 'f X' and 'f X' to
>    'f b'.
> 
> Then, we simply use SYB to apply f, when its type matches
> the parameter, or to recurse into the subexpressions using
> gmapT, otherwise. If X is private, f will only be applied
> to the "functor parameter" _positions_ in 'f a', not to other
> "functor parameter" _type_ occurrences in 'f a'

..but f might not be applied at all, which leads to the first
issue with this technique:

I was surprised to see Data instances for (a->b) and IO a, 
since for such non-algebraic types, there isn't anything to 
gfoldl or gmap over. And those instances do indeed seem
to offer very little functionality (not to mention the runtime
errors..). 

For fmap'/traverse', this means that f will not be applied, 
and so it is not a good idea to coerce the types as if f had 
been applied (because the hidden parameter could be 
exposed after traversal, with changed type and unchanged
representation!).. So we need to restrict the types of 
fmap'/traverse'. Which leads to two questions:

- what is the rationale for having these non-functional
    Data instances?

    If one is to have 'instance Data (a->b)', is there a way
    to make it more functional?

- how can we capture algebraic types in a type (class)?

    I thought that Data would do just that, being designed
    to gfoldl over concrete data constructors, but apparently
    not. And I don't really want to have a separate list of 
    all the types for which Data works, or of all the types 
    for which Data doesn't quite work.

Claus

ps. What is the right list for this topic?