Splitting SYB from the base package in GHC 6.10
claus.reinke at talk21.com
Mon Sep 1 11:04:53 EDT 2008
> The issue is: SYB is being moved out of base into its own package.
> However, the Data class is, in a way, tied to base since it depends on the
> deriving mechanism.
My understanding is that the deriving mechanism would still work if
class 'Data' was moved into 'syb', but changes in 'Data' would still
need to be matched in the deriving mechanism (which isn't auto-generated
from 'base', either). As long as 'syb' remains a core library, we can thus
focus on assigning modules to 'syb' or 'base' by functionality.
> Therefore, it was suggested that the entire Data.Generics.Basics module 
> should remain in base. This module defines the Data class and several
> associated functions and datatypes. I don't think anyone objected to this so
> far: please correct me if I'm wrong, or object now.
Assuming this is based on 'Data.Generics.Basics' and 'Data.Typeable'
being of more general use than the rest of 'syb' (justifying a preferred
dependency on 'base' rather than 'syb'), not any implementation constraints,
I don't object in general. It does suggest a separate 'data-reflect' package
for these two modules, but that could be left for later.
However, if 'Data' is in 'base', and the 'data' types are in 'base', then the
'Data' instances for those 'data' types should probably also be in base (*)
(the instance for 'Array a b' ought to move to 'array'). And the short-term
issue with this is that these instances, their location, and their importers,
need some revision, while 'base' wants to be stable.
The hope was that splitting off 'syb' from 'base' would contain the changes
in a package with named maintainer, outside 'base'. Wouldn't it be easier
to have all of 'Data' in 'syb', at least until 'Data' and 'Typeable' move into
their own package? But if you can find a way to make the 'Data'-in-'base'
route work, I'm not going to object.
> Then it was also suggested that Data.Generics.Instances  could stay in
> base (perhaps inside Basics as well). This, however, would prevent dealing
> with the "dubious" Data instances , and this was one of the motivating
> factors to split SYB from base. This refers concretely to the instances:
Rearranging the list slightly, for easier reference:
-- these have (or produce) substructures of type 'a', which aren't
-- traversed by the current Data instances (contrary to what one
-- would expect, say, from a generic 'fmap' over these types)
> instance (Data a, Data b) => Data (b -> a)
> instance Typeable a => Data (IO a)
> instance (Typeable s, Typeable a) => Data (ST s a)
> instance Typeable a => Data (STM a)
> instance Typeable a => Data (IORef a)
> instance Typeable a => Data (TVar a)
> instance Typeable a => Data (MVar a)
-- here, the 'a' is a phantom type, without matching substructures
> instance Typeable a => Data (Ptr a)
> instance Typeable a => Data (StablePtr a)
> instance Typeable a => Data (ForeignPtr a)
-- here, the 'a' corresponds to substructures that should only
-- be visible through the abstract interface, on top of which a
-- 'data'-like view can be provided
> instance (Data a, Integral a) => Data (Ratio a)
In addition, a longer list of instances offer only runtime errors
for some 'Data' operations (most notably for 'gunfold', though
abstract types in general have a problem with reflection support).
Are these necessary or would they profit from closer investigation?
If the latter, those instances should probably not be in 'base'.
> These instances are defined in such a way that they do not traverse the
> datatype. In fact, there is no other possible implementation, and this
> implementation at least allows for datatypes which contain both "regular"
> and "dubious" elements to still have their "regular" elements traversed.
Well, there are alternative instances that would at least improve traversal
support , but that wouldn't work for queries, I think.
> However, this implies that a user cannot redefine such instances even in the
> case where s/he knows extra information about these types that would allow
> for a more useful instance definition, for instance.
Indeed, the implicit presence of these instances is the main issue, and
reducing their presence and propagation affects 'base' and other core
and extra libaries, so needs to happen soon.
> Claus, please correct me if I'm wrong, but if the 11 "dubious" instances (or
> perhaps less, given your message in ) go in the syb package and the
> remaining, "standard" ones stay in base, we:
> - Mantain backwards compatibility regarding SYB in 6.10, and
> - Can still deal with the issue by releasing a new version of the syb
> package later, independently of GHC.
issues to consider, of the top of my head:
- to what extent can core libraries be updated independent of 'base'?
- unless 'base' can now be updated (there are two versions of 'base' in
ghc head), 'base' must not depend on 'syb'
- which other core libraries depend on 'syb'? are they updateable?
- the current importers of (parts of) 'Data.Generics' need to be revised 
- instances cannot be deprecated
- since all instances are in one module, one could deprecate the module,
but are module deprecations propagated to their importers automatically?
- would 'Data.Generics' need to be deprecated, in favour of a version that
does not implicitly re-export any/all instances? 
Maintaining strict backwards-compatibility in 6.10 while still allowing
for changes in 'syb' is going to be difficult, if only because clients might
depend on 'Data.IntSet' and the like to re-export all current 'Data'
instances, which we certainly want to stop.
My 'syb-utils'  has alternatives to 'Data.Generics' that export either
only standard instances or no instances, which would allow to deprecate
all 'Data.Generics*' modules that are less specific about their instance
exports, but would require use of alternative module names..
> Since the deadline for 6.10 is approaching I'm assuming that we should try
> to minimize the changes there, while keeping future development in the syb
> package as open as possible.
Definitely. But some choices need to be made now. Mainly what goes
where, how to handle deprecation, and how to reduce implicit instance
(*) this isn't a firm rule, either: recently, it was decided to keep the
'Data' instances for 'ghc' types out of 'ghc'..
More information about the Libraries