Syb Renovations? Issues with Data.Generics

Claus Reinke claus.reinke at talk21.com
Wed Jul 30 19:11:28 EDT 2008


> On Tue, Jul 29, 2008 at 08:27:00PM +0100, Claus Reinke wrote:
>> >>   My suggestion is to split this module into two, and stop the implicit
>> >>   import/export of the incomplete instances from Data.Generics.
>> >I don't think that this is a good idea.
>> Could you please elaborate on your reasons?
> That's what the rest of that e-mail was supposed to be.

You explained why the change would not give as much flexibility as
one might think, or at least not as easily, but you didn't explain why
you think it is a bad idea to gain at least the flexibility to choose
between instance and no instance for the problematic cases.

>> True. But the current situation is even worse: we either get all
>> instances (good and bad) or none
> I think you should always get all:

That would be fine if all instances were completely defined. They 
aren't. So the partial instances are imposed globally and irrevocably,
and instead of compiletime type errors, we get runtime errors.

> By the way, is there something somewhere describing the alternate
> instance that you want to define?

That is the whole point, isn't it? The Data framework isn't designed
to cope with things like (a->b) or (IO a), so there are no good instances 
one could define for these types (if anyone can suggest better instances,
please do!-). Hence the incomplete instances mixed in with the standard 
ones in Data.Generics.Instances.

Mostly, I don't want those instances at all (the incomplete ones),
so that the typechecker will complain if I try to use Data on something
it can't really handle. 

Scenario 1:

    We want to use deriving Data on types that have components
    of types (a->b) or (IO a), but we don't care about those
    components. This is the case for which the incomplete instances
    are provided.

Scenario 2:

    Any attempt to use Data (a->b) or Data (IO a) indicates an
    error. If we want to derive Data for complex structures containing 
    those types, we need to define Data instances for the immediately
    enclosing structures, or wrap those types in newtypes and define
    Data instances for those. This is the case for which the incomplete
    instances get in the way.

Scenario 3:

   We want to use deriving Data on types that have components
    of types (a->b) or (IO a), we do care about what happens in
    those components. It is surprisingly tricky to come up with
    sensible Data instances for these types that do anything more
    than the current dummies, so this scenario isn't as likely as I
    thought at first.
 
Scenario 4:

   We want to handle Data for (a->b) or (IO a) differently, 
    depending on context. Unless we can wrap those types in
    newtypes, this is very nearly impossible, due to the way
    instances propagate through projects.

The status quo supports only (1), and gives a mixture of runtime 
errors and wrong results for (2). In particular, the type checker
does not help us to find the cases we need to cover to keep our
programs from "going wrong".

With selective import, we can support (1), or get compiletime 
errors consistently for (2). We cannot usually support both (1)
and (2) in one program, but splitting the Instances module so
that we can be more selective in imports seems a worthwhile 
improvement.

Claus

PS. The situation is not improved by the current reexports of 
    Data.Generics.Instances from unexpected places. I have a
    package splitting Data.Generics.Instances into Standard and
    Dubious, and a Data.Generics.Alt that only reexports Standard
    instances. But as soon as I use this with, eg, an IntMap, I get 
    duplicate instance errors.

    A quick grep shows that the following re-export all instances 
    (sometimes deliberately, sometimes accidentally, by importing 
    Data.Generics for other reasons):

    libraries/array/Data/Array.hs
    -- libraries/base/Data/Generics/Instances.hs
    libraries/base/Data/Generics.hs
    libraries/bytestring/Data/ByteString/Internal.hs
    libraries/bytestring/Data/ByteString/Lazy/Internal.hs
    libraries/containers/Data/IntMap.hs
    libraries/containers/Data/IntSet.hs
    libraries/containers/Data/Map.hs
    libraries/containers/Data/Set.hs
    libraries/containers/Data/Tree.hs
    libraries/haskell-src/Language/Haskell/Syntax.hs
    libraries/network/Network/URI.hs
    libraries/packedstring/Data/PackedString.hs
    libraries/template-haskell/Language/Haskell/TH/Quote.hs
    libraries/template-haskell/Language/Haskell/TH/Syntax.hs

    As far as I can see, none of these depends on the incomplete
    instances, so these instances get re-exported by accident. If
    Data.Generics.Instances was split into 
    Data.Generics.Instances.Standard and
    Data.Generics.Instances.Dubious, and if Data.Generics.Alt 
    only reexported the former, those modules could be more 
    selective in their imports and the leaking of instances could 
    be avoided.




More information about the Libraries mailing list