[Hs-Generics] Syb Renovations? Issues with Data.Generics

Claus Reinke claus.reinke at talk21.com
Mon Jul 28 14:13:08 EDT 2008


Calling all Syb/Data.Generics users!-)

I keep running into problems with Data.Generics, mostly because I
actually want to use it (no claims that it is the best or final solution, or
that other approaches aren't equally in need of support, just that it is
the best-supported working approach right now).

Some tricky issues are (sometimes against published expectations)
solvable, suggesting useful additions to the library, but some seemingly
trivial things have me stumped, suggesting (to me, at least;-) a need for
improvements either in the library or in its documentation.

Part of the reason I'm interested in this now is that Data/Typeable
instances seem likely (I hope:-) to be added to the GHC Api, where
Thomas Schilling is working on improvements

http://hackage.haskell.org/trac/ghc/wiki/GhcApiStatus
http://hackage.haskell.org/trac/ghc/wiki/GhcApiAstTraversals

also, the old question of porting HaRe to the GHC Api is currently
being looked into again, by Chaddaï Fouché, and crucially depends
on Syb's generic traversals.

As it is still holiday season, it is a bit early for proposal deadlines,
but I'd like to start a discussion of Syb/Data.Generics and collect the
issues and solutions arising, in the hope of following up with concrete
proposals for improvements. To start the discussion, a simple item:

1. inconvenient convenience instances of Data for non-"data" types

    Data.Generics.Instances defines instances of Data for many
    types, including some abstract types that don't really fit into
    the concrete value based model of Data, like 'IO a' and 'a->b'.
    Those instances give runtime errors for some class methods,
    and mainly offer faked (no-op) gmap traversals, serving as a
    convenience/enabler for 'deriving instance Data':

    http://www.haskell.org/pipermail/generics/2008-June/000346.html

    A list of the odd instances in Data.Generics.Instances, with
    examples of their oddities, can be found here:

    http://www.haskell.org/pipermail/generics/2008-June/000347.html

    My suggestion is to split this module into two, and stop the implicit
    import/export of the incomplete instances from Data.Generics.

    Reactions to this suggestion have been muted so far (Simon PJ was
    as surprised as I was about the existence of these instances, but has
    no strong opinion about the issue, Alexey Rodriguez supports the
    suggestion, Ian Lynagh points out the difficulty of transition), which
    is one reason why I'll try to move the discussion to libraries at .

    Pro: - the instances are still available, and only one explicit import
                away, so 'deriving instance Data' for types containing
                uninteresting functions is still convenient

            - the problematic instances are no longer implicitly imported,
                so applications that don't want these instances can now
                avoid them completely, or define their own instances

            - these convenience instances are not just inconvenient for
                some applications, due to the way intances are handled
                in Haskell; they actually violate some "natural" invariants
                like "everything queries every substructure of the specified
                type", "everywhere applies a transformation at every
                substructure of matching type"

            - the situation is similar to Text.Show.Functions, as the
                convenience instances don't provide the full expected
                functionality, just barely enough for deriving to get by

    Cons: - due to the implicit import and use of these instances,
                    there is no obvious transition scheme; it seems that
                    the least painful process would be to make the change
                    without transition/deprecation period and to document
                    the explicit import option

                [it would be useful to have a way of deprecating instance
                imports, so that any deriving scheme depending on imports
                from a deprecated location would trigger a warning, in this
                case suggesting the new import location]

As I said, I'd like to wait until at least the Syb authors are back from
holidays before setting any proposal deadlines, but I'd like to invite
feedback from Syb users on this and other Syb issues. Here is a
preview on other items I'd like to raise later on, please add your own:

2. Data.Generics.Utils

Since Data/Typeable are compiler-derivable (in GHC) while other
classes like Functor/Traversable/etc are not, it would be useful if
generic instances for those other classes could be defined in terms
of Data/Typeable.

The Uniplate library already does this for its own classes via
Data.Generics.PlateData, and it appears that at least Functor is
defineable as well (code exists, proof is only informal at this stage,
and those invariant violations and runtime errors in the implicitly
imported dummy instances from (1) really get in the way):

http://www.haskell.org/pipermail/generics/2008-June/000343.html
http://www.haskell.org/pipermail/generics/2008-July/000349.html
http://www.haskell.org/pipermail/generics/2008-July/000351.html

What other classes can be defined in this way? Traversable
(traverse) seems very nearly possible, what else?

3. Performance

Naive use of Syb traversal schemes can lead to huge performance
losses. Experienced users tend to write their own traversal schemes,
using Syb's low-level Api directly, but we can take inspiration from
some Uniplate/PlateData optimization techniques and generalise
them for use with Syb's high-level traversal scheme Api, yielding
similar performance gains for everywhere/everything:

http://www.haskell.org/pipermail/generics/2008-July/000353.html

Another direction that might be worth exploring is to use Maps
instead of nested generic extensions to define adhoc-overloaded
transformation and queries (I've actually started playing with that,
but am currently stuck on GHC ticket #2463).

4. Useability

There is probably nothing one can do to make the types of
Syb's low-level Api less of a brain hazard, but not all of the
stumbling blocks seem to be necessary consequences of the
carefully crafted edifice of interactions between nearly polymorphic
types, runtime type checks and type reflection. Examples:

- there doesn't seem to be a way to get hold of a types'
    constructors, only of constructor representations, structure
    scaffolds, and structure generators

- the actual domain on which a transformation/query acts
    is hidden behind the near-polymorphic default type of
    generic extensions

- I can't seem to figure out how to use typeOf1, when the
    other Syb operations only give me 'forall a . Data a => a';
    instead, I seem to be forced to use something like:

       [ mkTyConApp tyCon (init tyArgs) | not (null tyArgs) ]
          where (tyCon,tyArgs) = splitTyConApp typeRep

- others?

What are your personal gripes with Syb/Data/Typeable,
and for which of them do you see a chance of addressing
them by changing/adding code?

Claus



More information about the Generics mailing list