Libraries in the repo
dons at galois.com
Wed Aug 26 19:55:01 EDT 2009
> Simon and I have been chatting about how we accommodate libraries in the
> GHC repository. After previous discussion on this list, GHC has been
> gradually migrating towards having snapshots of libraries kept as
> tarballs in the repo (currently only "time" falls into this category),
> but I don't think we really evaluated the alternatives properly. Here's
> an attempt to do that, and to my mind the outcome is different: we
> really want to stick to having all libraries as separate repositories.
> * Scope: libraries that are needed to build GHC itself (aka "boot
> * Boot libraries are of several kinds:
> - INDEPENDENT: Independently maintained (e.g. time, haskeline)
> - COUPLED: Tightly coupled to GHC, but used by others (base)
> - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH)
> * Most boot libraries are INDEPENDENT. INDEPENDENT libraries have a
> master repository somewhere separate from the GHC repositories.
> * We need a branch of INDEPENDENT libraries, so that GHC builds don't
> break when the upstream package is modified.
> * Sometimes we want to make local modifications to INDEPENDENT
> - when GHC adds a new warning, we need to fix instances of the
> warning in the library to keep the GHC build warning-free.
> - to check that the changes work, before pushing upstream
> Choices for how we deal with libraries in the GHC repository: (+) is a
> pro, (-) is a con.
> (1) Check out the library from a separate repo, using the darcs-all
> script. The repo may either be a GHC-specific branch
> [INDEPENDENT], or the master copy of the package
> (+) we can treat every library this way, which gives a
> consistent story. Consistency is good for developers.
> (+) [INDEPENDENT] makes it easy to push changes upstream and sync
> with the upstream repo (unless upstream is using a different
> (-) [INDEPENDENT] we have to be careful not to let our branches
> get too far out of sync with upstream, and we must
> sync before releasing GHC.
> (2) Put a snapshot tarball of the library in libraries/tarballs,
> but allow you to checkout the darcs repo instead.
> (-) [SPECIFIC/COUPLED] this approach doesn't really make sense,
> because we expect to be modifying the library often.
> (-) updating the snapshot is awkward
> (-) workflow for making a change to the library is awkward:
> - checkout the darcs repo
> - make the change, validate it
> - push the change upstream (bump version?)
> - make a new snapshot tarball
> - commit the new snapshot to the GHC repo.
> (-) having tarballs in the repository is ugly
> (-) we have no revision history of the library
> (3) The GHC repo *itself* contains every library unpacked in the
> tree. You are allowed to check out the darcs repo instead.
> (+) atomic commits to both the library and GHC.
> (+) doing this consistently would allow us to remove darcs-all,
> giving a nice easy development workflow
> (-) [INDEPENDENT/COUPLED] still need a separate darcs repo.
> (-) [INDEPENDENT/COUPLED] pushing changes upstream is hard
> (-) [INDEPENDENT/COUPLED] manual syncing with upstream
> (-) [COUPLED] (particularly base) syncing with
> upstream would be painful.
> (3) works best for SPECIFIC libraries, whereas (1) works best for
> INDEPENDENT/COUPLED libraries. If we want to treat all libraries the
> same, then the only real option is (1).
> Experience with Cabal and bytestring has shown that (1) can work for
> INDPENDENT libraries, but only if we're careful not to get too
> out-of-sync (as we did with bytestring). In the case of Cabal, we never
> have local changes in our branch that aren't in Cabal HEAD, and that
> works well.
As author of bytestring, I'd prefer it if GHC used a released version
direct from Hackage. I.e. GHC could snapshot a Hackage release, and get
out of the business of cloning repos. Same for other INDPENDENTs.
More information about the Glasgow-haskell-users