[Hs-Generics] Traversable Functor Data,or: X marks the spot
Claus Reinke
claus.reinke at talk21.com
Tue Jun 24 14:45:18 EDT 2008
Dear Generics;-)
this is a repost from cvs-libraries of an experiment which you
might find of interest, and on which I'd welcome feedback. One
impact being that SYB can implement your GMap test (on which
it currently defaults), among other things (see message 1 below,
the technique probably applies to Uniplate as well?). One issue
being that Data instances for non-algebraic types that default to
doing nothing render this technique unsafe (see message 2 below).
Btw, the announcement of the unified generics library project
was so long ago that I had forgotten about it and about this list
(thanks to Simon PJ for reminding me). My interest in this is
partially from my past with HaRe, partially from wanting generic
traversal support over GHC AST types, so I'm more interested
in generic traversal/analysis style libraries.
Claus
--------- message 1
The issue at hand: can we use Data/Typeable to do what Functor
and Traversible do so easily, namely type-changing maps? Or should
there be support for deriving all of Data/Typeable/Functor/Traversible
over GHC's AST?
A drastically simplified example from David's Haddock 2 code
(Haddock.Interface.Rename) needs to do a kind of mapM
renameDoc :: Monad m => HsDoc Name -> m (HsDoc DocName)
which is straightforward with Traversible, but that is not derivable
yet (David has been working on that, though), while the usual basis
of SYB, Data/Typeable, is derivable, but all SYB transformations
are based on gfoldl, the type of which does not permit type-changing
maps:
gfoldl :: (Data a)
=>(forall a1 b. (Data a1) => c (a1 -> b) -> a1 -> c b)
-> (forall g. g -> c g)
-> a-> c a
One could probably go from heterogeneous Data types to a
homogeneously typed tree representation, do the map there,
then transform back, but that would require additional overhead
and a type coercion for the backtransform. Also, it seems a pity
that the derived code for gfoldl can handle such maps - it is just
gfoldl's type that gets in the way (and trying to generalize that
type is a whole different kind of headache..).
While boilerplate scrappers around the globe eagerly await the
release of
"Generalizing the type of gfold, or: to braindamage and beyond"
(author: unknown; release date: unknown; release: unlikely;-)
I thought I'd have a go at the smaller problem of finding a way
to bypass gfoldl's type systematically in order to use its derivable
code for things like fmap and traverse. This message summarizes
how far I've got so far, and asks for comments (does this do
what it should? is it safe? assuming it can be cleaned up to do
what it should in a safe way, does this mean that deriving Data/
Typeable for GHCs AST types will be sufficient, or should we
still have Traversible as well? etc.).
First, here is an implementation of renameDoc in terms of
gfoldl and unsafeCoerce (slightly cleaned up version of what
I sent in the other thread earlier):
data Name = Name String deriving (Show,Data,Typeable)
data DocName = DocName String deriving (Show,Data,Typeable)
renameDoc :: Monad m => HsDoc Name -> m (HsDoc DocName)
renameDoc (DocIdentifier ids) =
mapM (\(Name n)->return (DocName n)) ids >>= return . DocIdentifier
renameDoc hsDoc = n2d (gfoldl k return hsDoc)
where k c x = ap c (mkM (d2n . renameDoc) x)
n2d :: Monad m => m (HsDoc Name) -> m (HsDoc DocName)
n2d = unsafeCoerce
d2n :: Monad m => m (HsDoc DocName) -> m (HsDoc Name)
d2n = unsafeCoerce
'DocIdentifier :: [id] -> HsDoc id' is the only constructor in HsDoc
that involves the parameter type 'id', so renameDoc either does the
parameter type conversion or -for other constructors- recurses into
the subexpressions, using gfoldl to build a monadic map.
The important insight here is that gfoldl's code can handle the task,
we just pretend that our map is type-preserving to conform to gfoldl's
restrictive type, by coercing the result types (inside gfoldl, we pretend
that renameDoc returns a (HsDoc Name), which outside gfoldl, we
coerce back to (HsDoc DocName)).
Assuming that noone looks at the return types inside gfoldl (at least,
not in a way that would be affected by this change in type - SYB
does support adhoc overloading, after all, as in mkM here), this
seems to work:
testDoc =
DocAppend (DocParagraph (DocAppend (DocIdentifier [Name "well-typed"]) DocEmpty))
(DocAppend (DocString "programs")
(DocIdentifier [Name "don't",Name "go",Name "anywhere"]))
*Main> renameDoc testDoc
DocAppend (DocParagraph (DocAppend (DocIdentifier [DocName "well-typed"]) DocEmpty))
(DocAppend (DocString "programs")
(DocIdentifier [DocName "don't",DocName "go",DocName "anywhere"]))
But can we generalize this, and what about those coercions?
Can we -ideally- define something like fmap and traverse in
terms of gfoldl, and hide the uglyness in their implementations?
Well, I'll spare you (and me;-) the details of my struggle with
the type system, and just show the results, with some comments.
First, the simpler fmap:
-- "X marks the spots";-) X should be private
data X = X deriving (Data,Typeable)
fmap' :: (Data (f X)) => (a -> b) -> f a -> f b
fmap' f x = markTheSpots (rec (wrap f)) x
where
markTheSpots :: (f X -> f X) -> (f a -> f b)
markTheSpots f = unsafeCoerce . f . unsafeCoerce
rec :: (Data a) => (X -> X) -> a -> a
rec f x = (gmapT (rec f) `extT` f) x
wrap :: (a -> b) -> (X -> X)
wrap f = unsafeCoerce . f . unsafeCoerce
Surprisingly simple for something that seemed impossible
at first, isn't it?-) Since we're already committed (for this
experiment, at least) to some type fiddling, we can make
more constructive use of unsafeCoerce, for two purposes:
1. We wrap the function parameter f, to make it look
like a type-preserving function, on some private type X.
2. We mark the occurrences of the type constructor f's
parameter type, by coercing 'f a' to 'f X' and 'f X' to
'f b'.
Then, we simply use SYB to apply f, when its type matches
the parameter, or to recurse into the subexpressions using
gmapT, otherwise. If X is private, f will only be applied
to the "functor parameter" _positions_ in 'f a', not to other
"functor parameter" _type_ occurrences in 'f a':
*Main> fmap' not $ (True,True)
(True,False)
*Main> fmap' not $ [True,True]
[False,False]
*Main> fmap' (\(Name s)->(DocName s)) testDoc
DocAppend (DocParagraph (DocAppend (DocIdentifier [DocName "well-typed"]) DocEmpty))
(DocAppend (DocString "programs")
(DocIdentifier [DocName "don't",DocName "go",DocName "anywhere"]))
Note how we use one kind of recursion where a manual
implementation of fmap would use two: handling subexpressions
(which we'd usually do by pattern matching and expression
construction) and functor type recursion.
Ecouraged by this success, we are ready to tackle the
slightly more tricky traverse, using the same techniques:
mark the spot, wrap the worker, one kind of recursion.
Only this time, we need to take care of the applicative
plumbing as well, so it's gfoldl instead of gmapT, and
some more complex types.
We need the usual SYB type extension support, but for
Applicative, not Monad (SYB was defined before
Applicative existed, it seems..):
-- type extension over Applicative f
mkF :: forall f a b . (Applicative f,Typeable a,Typeable b)
=> (b -> f b) -> a -> f a
mkF f x = case gcast (F f) of { Just (F f) -> f x; Nothing -> pure x }
extF :: forall t f a b . (Typeable a,Typeable b)
=> (a -> f a) -> (b -> f b) -> (a -> f a)
(f `extF` fspec) x = case gcast (F fspec) of { Just (F fspec) -> fspec x; Nothing -> f x }
newtype F f x = F { unF :: x -> f x }
And here we go:
traverse' :: forall f t a b . (Applicative f,Typeable1 f,
Typeable1 t,Data (t X),
Typeable a)
=> (a -> f b) -> t a -> f (t b)
traverse' f x = markTheSpots (rec (wrap f)) x
where
markTheSpots :: forall a b . (t X -> f (t X)) -> (t a -> f (t b))
markTheSpots f = unsafeCoerce . f . unsafeCoerce
wrap :: forall a b . (a -> f b) -> (X -> f X)
wrap f = unsafeCoerce . f . unsafeCoerce
rec :: forall x . Data x => (X -> f X) -> x -> f x
rec f x = (gfoldl (k f) z `extF` f) x
k :: forall a b . Data a => (X -> f X) -> f (a -> b) -> a -> f b
k f c x = c <*> (mkF (rec f :: Data a => a -> f a) `extF` f) x
z c = pure c
This does seem to do the right thing, so I don't seem to be
completely on the wrong track:
*Main> traverse' (pure . not) (True,True)
(True,False)
*Main> traverse' (pure . not) [True,True]
[False,False]
*Main> traverse' print testDoc
Name "well-typed"
Name "don't"
Name "go"
Name "anywhere"
DocAppend (DocParagraph (DocAppend (DocIdentifier [()]) DocEmpty))
(DocAppend (DocString "programs") (DocIdentifier [(),(),()]))
*Main> traverse' (pure) testDoc
DocAppend (DocParagraph (DocAppend (DocIdentifier [Name "well-typed"]) DocEmpty))
(DocAppend (DocString "programs")
(DocIdentifier [Name "don't",Name "go",Name "anywhere"]))
*Main> traverse' (pure . (\(Name s)->DocName s)) testDoc
DocAppend (DocParagraph (DocAppend (DocIdentifier [DocName "well-typed"]) DocEmpty))
(DocAppend (DocString "programs")
(DocIdentifier [DocName "don't",DocName "go",DocName "anywhere"]))
but I'm too battered from trying to coerce gfoldl to analyze this
properly at the moment, so I'm sending this in the hope of (a)
not having to look at gfoldl's type for a while !-) and (b) getting
some feedback (is this useful? does the extra overhead matter?..),
caveats (cyclic programs, nested traversals, escaping Xs, ..?), etc.
Over to you,
Claus
PS. the "X marks the spot" trick reminds me of the popular
medicine topic: delivery system plus targeted activation
(SYB plus unsafeCoerce).
------------- message 2
> fmap' :: (Data (f X)) => (a -> b) -> f a -> f b
> fmap' f x = markTheSpots (rec (wrap f)) x
> where
> markTheSpots :: (f X -> f X) -> (f a -> f b)
> markTheSpots f = unsafeCoerce . f . unsafeCoerce
> rec :: (Data a) => (X -> X) -> a -> a
> rec f x = (gmapT (rec f) `extT` f) x
> wrap :: (a -> b) -> (X -> X)
> wrap f = unsafeCoerce . f . unsafeCoerce
..
> 1. We wrap the function parameter f, to make it look
> like a type-preserving function, on some private type X.
>
> 2. We mark the occurrences of the type constructor f's
> parameter type, by coercing 'f a' to 'f X' and 'f X' to
> 'f b'.
>
> Then, we simply use SYB to apply f, when its type matches
> the parameter, or to recurse into the subexpressions using
> gmapT, otherwise. If X is private, f will only be applied
> to the "functor parameter" _positions_ in 'f a', not to other
> "functor parameter" _type_ occurrences in 'f a'
..but f might not be applied at all, which leads to the first
issue with this technique:
I was surprised to see Data instances for (a->b) and IO a,
since for such non-algebraic types, there isn't anything to
gfoldl or gmap over. And those instances do indeed seem
to offer very little functionality (not to mention the runtime
errors..).
For fmap'/traverse', this means that f will not be applied,
and so it is not a good idea to coerce the types as if f had
been applied (because the hidden parameter could be
exposed after traversal, with changed type and unchanged
representation!).. So we need to restrict the types of
fmap'/traverse'. Which leads to two questions:
- what is the rationale for having these non-functional
Data instances?
If one is to have 'instance Data (a->b)', is there a way
to make it more functional?
- how can we capture algebraic types in a type (class)?
I thought that Data would do just that, being designed
to gfoldl over concrete data constructors, but apparently
not. And I don't really want to have a separate list of
all the types for which Data works, or of all the types
for which Data doesn't quite work.
Claus
ps. What is the right list for this topic?
More information about the Generics
mailing list