The base library and GHC 6.10

Simon Peyton-Jones simonpj at microsoft.com
Thu Aug 28 09:02:29 EDT 2008


| We're trying to decide what to do with the base library for GHC 6.10, in
| terms of how much of it should be broken up into separate packages.
| Since the recent proposal about this, we may be rethinking what we want
| to do, and we would welcome your opinions.

Thanks Ian. I found it helpful to number off the advantages and disadvantages so they are easy to refer to, so I enclose a slightly text-processed version of your message below.

My thoughts

* I find (D2), (D3), and (D4) -- see below -- quite strong reasons for maintaining the status quo

* While (A1)-(A3) are advantages, I'm not sure they are powerful enough to want to disturb the status quo in the *short term* (ie before 6.10).

* The exception is SYB, for which we have a willing and active maintainer, so (A1) is very strong.  That isn't the case for any other package.

So my suggestion would be:
  * for 6.10: split out SYB and nothing else
  * later: maybe more, let's see

Simon


=================== Text-processed version of Ian's message ===================

We're trying to decide what to do with the base library for GHC 6.10.
Specifically we want to work out

        how much of the current package "base" should
        be split into separate packages.

Since the recent proposal about this (http://hackage.haskell.org/trac/ghc/ticket/1338),
we may be rethinking what we want to do, and we would welcome your opinions.




Motivation: why split up "base"?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A1. It becomes possible to separately upgrade the parts, and makes it
   easier for different people to maintain different parts.

A2. It makes it easier to see what the hierarchy is, and to restructure
   the hierarchy, and work towards more of the code being shared
   between different Haskell implementations. Plus it means that
   people can't te-tangle the logically separate components, which is
   all too easy to do when you just have one huge package.

A3. It also means that packages are clearer about what they depend
   on. One possibility, which would be really cool, is to separate all
   the IO modules from the non-IO modules; between that and looking at
   the extensions used (e.g. TH and FFI) it would then be clear
   whether or not a library could do any IO. Of course, the Prelude is
   a hurdle for this goal.

Also, GHC's (still in flux) plan for the base library:
http://hackage.haskell.org/trac/ghc/wiki/DarcsConversion#Planforlibraries
essentially means forking base (as nhc98 would continue to use base in a
darcs repo, while GHC would use it from a git repo, and there are no
plans for any merging between these repos). Therefore any code that is
to be shared between the implementations needs to not be in base, so
from that point of view it would be good to pull out as much as
possible.

Why *not* split up "base"?
~~~~~~~~~~~~~~~~~~~~~~~~~~

D1. Splitting up base imposes costs on others.  Specifically, the
   dependencies of packages need to be updated to reflect the
   changes. However, GHC 6.10 will come with a base version 3, as well
   as the new base version 4, so the transition should be much
   smoother than the base 2 -> base 3 transition.

D2. It would be bad to make a change, and then make *another* change to
   the same thing.  So anywhere there is doubt we should leave htings
   unchanged

D3. Several people expressed reservations about a proliferation of
   packages containing only one module, or only a little code
   (less than 500 lines, say).

D4. Splitting out a package whose *implementation* depends in an
    intimate way on "base" is a bit of a false separation.  At one
    extreme a new package could simply re-export a bunch of
    types and functions from "base".  If this is the case, none
    of A1-A3 hold.


What I propose
~~~~~~~~~~~~~~
(In the below, LoC stands for "Lines of Code".)

----- SYB: generic programming -------
First the easy bit: The Data.Generics hierarchy is going to have a
separate maintainer, and I think that everyone is agreed that it should
be pulled out into an "syb package". I'll treat this as not part of base
from here on.

The only thing still being debated here is whether the Data class itself
should remain in base or not. Some people believe that it should remain
in base, as it is desirable to have Data instances for as many types as
possible, and because there is a resistance among library writers
against adding dependencies. The counter argument is that there are many
other classes that the same is true of (e.g. uniplate, syb-with-class,
binary), and it does not scale to put all of these classes into base.
Also, by requiring a dep to be added even for the classes that have
historically been included in base, adding dependencies for the sake of
providing instances may become more socially acceptable.

----- GetOpt ------------
    System.Console.GetOpt
    (129 LoC, 1 module)
This doesn't really fit in with anything else in base, so I propose
to split it off into its own getopt package. I don't think there is
much objection to this one.  [SLPJ: I am unconvinced.]

----- ST ----------------
    Control.Monad.ST
    Data.STRef
    (120 LoC, 6 modules)
hierarchies. I propose that we put these into an "st" package. The
low-level implementation is still in base (69 LoC of in the GHC.ST and
GHC.STRef), so to some extent this is a false separation (D4). On the
other hand, nhc98 doesn't support ST, so splitting this package off
gets us closer to all implementations exposing the same modules from
base.

------ Concurrent --------
    Control.Concurrent hierarchy
    (490 LoC, 6 modules)
and
    System.Timeout (39 LoC)
    Data.Unique (32 LoC)
(those latter modules depend on Control.Concurrent.*). I propose that
we put these into "concurrent", "timeout" and "unique" packages
respectively. Again, this is a false separation, with 698 LoC left
behind in GHC.Conc; at some time we'd hope that this could either be
moved down to ghc-prim, or make a new ghc-concurrent package for it,
depending on how the dependencies work out. Again, nhc doesn't support
concurrent or its dependencies, so this gets us closer to a consistent
base interface.

[SLPJ: I don't think we should split out concurrent yet.  I'm pretty
certain that we should not generate tiny new packages for "timeout"
and "unique".]

------ Summary -------

Splitting off the above 5 packages would leave 106 modules and 16,621 LoC
in base. About 5% of the LoC, and 12.5% of the modules, would be in the
new packages.

[SLPJ: the fact that the change is so small makes me think that
A2, A3 are not being helpful.  I think there is only a strong case for
SYB, becuase of A1.]



More information about the Libraries mailing list