The base library and GHC 6.10

Mon Sep 1 08:13:35 EDT 2008

I thought class 'Data' was in 'Data.Generics.Basics' because it provides
generic access to 'data'-definitions. SYB's generic programming library
code (strategies for queries and transformations) builds on that, so does
(one version of) Uniplate.

Most other generic programming libraries are based on generic access
to _type_ representations, the basics of which would more accurately
appear somewhere in 'Types.Generics' - no conflict with SYB here.

One could move the actual generic libraries into 'Generics.*', but until
there is an actual need for that, I'd prefer things to stay stable, with
libraries building on generic data access in 'Data.Generics' and libraries
building on generic type access appearing in 'Types.Generics'.

One could rename some of the SYB modules, eg, 
'Data.Generics.Schemes' -> 'Data.Generics.SybSchemes'
and so forth, but as long as other 'data'-based libraries are 
not deprived of namespace there, and other 'type'-based libraries
either don't provide general traversal schemes or live in 'Types.Generics',
there is no immediate need for such renaming, beyong putting the
modules in a 'syb' package, is there?

(note that 'Data.Typeable' is outside 'Data.Generics', though it is
part of the basics that SYB depends on)

> Claus says that "half the instances of Data are controversial".  
> Is that really right Claus?  Isn't it just functions and IO?

As an ideal, I'd like 'Data.Generics.Instances.Dubious' to be
empty - 'Data' instances should either be standard, or not exist
at all (at least not in library code). What I did was simply to take
anything that looked dubious and move it into a separate module,
to facilitate further discussion and more control over imports.

The discussion I had hoped for didn't happen, so that is still were
my code stands, but I do hope it isn't the final state. As for numbers,
I currently have 32 instances in 'Data.Generics.Instances.Standard'
and 11 instances in 'Data.Generics.Instances.Dubious' [1]. 

My initial split was mostly on the basis of 'gfold'/'gmapT' not traversing 
substructures, so some more of the 'Standard' instances are actually
incomplete, and some of the 'Dubious' instances could possibly be 
declared "safe (with side conditions)", but then someone would still
have to look at making the instances more complete/less certain to 
generate runtime errors.

The current 'Dubious' list has things like

- 'Ratio a': while values of type 'a' actually exist here, they are not 
    meant to be visible in a concrete way, only via the abstract interface;
    and the abstract interface can support a 'data'-like view

- various 'Ptr a': here the 'a' is a phantom type, there are no objects
    of type 'a' to be traversed; but neither is there much 'data'-like
    about these pointers..

- 'b->a', 'IO a', 'ST s a', 'STM a': these are thoroughly un-'data'-like;
    though the instances could be improved to provide transformation
    access to the 'a' values, the same doesn't work for queries, and
    the '(->)b' context is completely out of range for 'Data'.

The current 'Standard' list has various instances that just bomb
on some operations, including the 'Array a b' instance, which 
otherwise nicely demonstrates how to handle abstract types.

Moving the more stable and standard 'Data' instances into base
might not hinder development/debugging of the remaining instances,
but right now, I don't think it will include sufficiently many instances 
to avoid dependencies on syb. As I explained in my previous email, 
the implicit presence of instances is itself a source of bugs, due to the
propagation of instances in Haskell, not to mention ghc bug #2182.

Since very few of the current 'Data' instance importers actually
need those imports (they just happen to be included if one imports
'Data.Generics'), I'd prefer to remove the implicit imports (and 
implied re-exports), making the remaining real dependencies 
explicit by depending on syb (again, see previous email).

I'd really like to see the real issues addressed before we start
worrying about names, as this has turned out to be a rats nest
of bugs, including:

- incomplete 'Data' instances (operations that bomb now, but
    might be given better implementations)
- incompleteable 'Data' instances (operations that cannot be
    implemented, suggesting that these instances shouldn't exist)
- 'deriving Data' depending on 'Data' instances for everything,
    instead of skipping substructure types that cannot be handled
    anyway (smarter deriving could avoid dumb instances, by
    annotating types that should not be traversed instead of 
    traversing these types via dummy instances that are then
    globally available/irreplaceable)
- unneccessary 'Data' instance import/export (Data.IntMap has
    absolutely no business bringing 'instance Data (IO a)' into scope)
- ghc sessions retaining instances (#2182), leading to build errors
    even in separate module hierarchies
- ghc listing "orphan instances" as a performance issue, 
    re-emphasized recently by warnings turned into errors, which
    has led some to believe they are a design fault, rather than a
    representation of a valid design decision
- it doesn't help that Haskell doesn't support instance import/
    export control (yes, the instances are unnamed, but naming
    class, type, and module would seem sufficient to block instance
    imports/exports where they are not wanted)

Claus

[1]  http://www.cs.kent.ac.uk/~cr3/toolbox/haskell/#syb-utils