Syb Renovations? Issues with Data.Generics
Claus Reinke
claus.reinke at talk21.com
Wed Jul 30 19:11:28 EDT 2008
> On Tue, Jul 29, 2008 at 08:27:00PM +0100, Claus Reinke wrote:
>> >> My suggestion is to split this module into two, and stop the implicit
>> >> import/export of the incomplete instances from Data.Generics.
>> >I don't think that this is a good idea.
>> Could you please elaborate on your reasons?
> That's what the rest of that e-mail was supposed to be.
You explained why the change would not give as much flexibility as
one might think, or at least not as easily, but you didn't explain why
you think it is a bad idea to gain at least the flexibility to choose
between instance and no instance for the problematic cases.
>> True. But the current situation is even worse: we either get all
>> instances (good and bad) or none
> I think you should always get all:
That would be fine if all instances were completely defined. They
aren't. So the partial instances are imposed globally and irrevocably,
and instead of compiletime type errors, we get runtime errors.
> By the way, is there something somewhere describing the alternate
> instance that you want to define?
That is the whole point, isn't it? The Data framework isn't designed
to cope with things like (a->b) or (IO a), so there are no good instances
one could define for these types (if anyone can suggest better instances,
please do!-). Hence the incomplete instances mixed in with the standard
ones in Data.Generics.Instances.
Mostly, I don't want those instances at all (the incomplete ones),
so that the typechecker will complain if I try to use Data on something
it can't really handle.
Scenario 1:
We want to use deriving Data on types that have components
of types (a->b) or (IO a), but we don't care about those
components. This is the case for which the incomplete instances
are provided.
Scenario 2:
Any attempt to use Data (a->b) or Data (IO a) indicates an
error. If we want to derive Data for complex structures containing
those types, we need to define Data instances for the immediately
enclosing structures, or wrap those types in newtypes and define
Data instances for those. This is the case for which the incomplete
instances get in the way.
Scenario 3:
We want to use deriving Data on types that have components
of types (a->b) or (IO a), we do care about what happens in
those components. It is surprisingly tricky to come up with
sensible Data instances for these types that do anything more
than the current dummies, so this scenario isn't as likely as I
thought at first.
Scenario 4:
We want to handle Data for (a->b) or (IO a) differently,
depending on context. Unless we can wrap those types in
newtypes, this is very nearly impossible, due to the way
instances propagate through projects.
The status quo supports only (1), and gives a mixture of runtime
errors and wrong results for (2). In particular, the type checker
does not help us to find the cases we need to cover to keep our
programs from "going wrong".
With selective import, we can support (1), or get compiletime
errors consistently for (2). We cannot usually support both (1)
and (2) in one program, but splitting the Instances module so
that we can be more selective in imports seems a worthwhile
improvement.
Claus
PS. The situation is not improved by the current reexports of
Data.Generics.Instances from unexpected places. I have a
package splitting Data.Generics.Instances into Standard and
Dubious, and a Data.Generics.Alt that only reexports Standard
instances. But as soon as I use this with, eg, an IntMap, I get
duplicate instance errors.
A quick grep shows that the following re-export all instances
(sometimes deliberately, sometimes accidentally, by importing
Data.Generics for other reasons):
libraries/array/Data/Array.hs
-- libraries/base/Data/Generics/Instances.hs
libraries/base/Data/Generics.hs
libraries/bytestring/Data/ByteString/Internal.hs
libraries/bytestring/Data/ByteString/Lazy/Internal.hs
libraries/containers/Data/IntMap.hs
libraries/containers/Data/IntSet.hs
libraries/containers/Data/Map.hs
libraries/containers/Data/Set.hs
libraries/containers/Data/Tree.hs
libraries/haskell-src/Language/Haskell/Syntax.hs
libraries/network/Network/URI.hs
libraries/packedstring/Data/PackedString.hs
libraries/template-haskell/Language/Haskell/TH/Quote.hs
libraries/template-haskell/Language/Haskell/TH/Syntax.hs
As far as I can see, none of these depends on the incomplete
instances, so these instances get re-exported by accident. If
Data.Generics.Instances was split into
Data.Generics.Instances.Standard and
Data.Generics.Instances.Dubious, and if Data.Generics.Alt
only reexported the former, those modules could be more
selective in their imports and the leaking of instances could
be avoided.
More information about the Libraries
mailing list