Splitting SYB from the base package in GHC 6.10

Mon Sep 1 20:00:26 EDT 2008

>>   gmapT f fun = f . fun
>>    -- instead of gmapT f fun = fun
> 
> But wouldn't these introduce additional inconsistencies? At least if
> introduced in the library itself. I am used to think that gmapT is
> implemented using gfoldl, and is only inside the Data class to allow for
> more efficient implementations, and not for alternative implementations...

Well, I'd like to define 'gmapT' in terms of 'gfoldl' (in a non-trivial, sensible
way). The default for gfoldl is 'gfoldl _ z = z', but that doesn't help much here
since 'z's type is rather too polymorphic to be of use: 'forall c g . g -> c g'.
I've wondered occasionally whether requiring 'Typeable g' there would help.

The next try is to expand our function, so that we can pretend we have
some constructor to work on in 'gfoldl':

      -- fun ==> \x->fun x ==> (\fun x->fun x) fun

Then we can do (using scoped type variables to fix the 'a' and 'b'):

  gfoldl k z fun = z (\fun x->fun x) `k` fun
  -- gmapT f fun = f . fun
  gmapT f fun = unId $ gfoldl (k f) (Id) fun 
    where k f (Id c) x = Id (c (case (cast x :: Maybe (a -> b)) of
                                 Just x -> fromJust $ cast (f . x)
                                 Nothing -> x))

but whether that is very enlightening, I wouldn't want to say;-)

> Just for my understanding, can you give me an example of a datatype which
> currently has (b) but not (c) and vice-versa?

b ('toConstr'&co) usually comes with c ('gunfold'). I've defined some 'Data'
instances which implemented b without c, but I don't think that is typical.

My reason for splitting the functionality in three ('gfoldl', 'toConstr', 'gunfold')
was just to be systematic, hoping in particular for implementations of
'gunfold' (or, more generally, constructing 'data' from parts) that do not 
depend on reflection.

> Anyway, I guess keeping Data inside base does not preclude such splitting of
> Data: for backward compatibility the original Data would have to remain
> available, right?

It used to be the case that 'base' could not be updated, so anything
in it would be fixed until the next ghc release. Preserving the original
'Data' would also preserve the original clients and incomplete instances,
which is not what one would want (instead, one would want to instantiate
just those component classes whose methods can be implemented and 
used without runtime errors, preserving compatibility of non-failing code).

But that is all far future, 6.12 or so, not urgent now. I just mentioned it
because there is very little about SYB that I'm sure about, and this is
another example of something that might be worth looking into. And
the more you keep in 'base', the less you can improve.

>> More reason for moving everything to 'syb', keeping it flexible
>> for a while.
> 
> By "everything" do you mean all instances or all the "dubious" ones? IIRC,
> the argument for having the "standard" instances in base is that leaving
> Data alone without any instances would mean that in most cases you would
> have to import SYB anyway to get any functionality. Or are there other
> reasons?

Note the "for a while" there. If you are at liberty to change 'base'
and users can update 'base' without waiting for the next ghc release,
then you can do the changes in 'base'. Otherwise, everything that
might change should be in a package you can change and users
can update. Making that package 'syb' keeps things simple - later,
after things have settled down again, one could spin off 'Data' and 
'Typeable' into their own package ('data-reflection', 'introspection', ..). 
Or one could re-integrate 'Data' into 'base' to get smaller 
'build-depends' (and less accurate Cabal dependencies..).

But while you're looking into improving things, they need to be
changeable, and 'base' usually isn't.

Claus