Libraries in the repo

Simon Marlow marlowsd at gmail.com
Thu Aug 27 09:10:41 EDT 2009


Incedentally, the reason I'd like us to make a decision on this now is 
because I'm about to add two new boot libraries:

   - binary, to support a binary cache of GHC's package database
     (INDEPENDENT)

   - bin-package-db, the code to read and write the binary package
     database (SPECIFIC, shared by ghc and ghc-pkg).

I don't much like bin-package-db being a separate package, given that 
it's only 100 lines or so in one module, but I don't see a good alternative.

Cheers,
	Simon

On 26/08/2009 17:15, Simon Marlow wrote:
> Simon and I have been chatting about how we accommodate libraries in the
> GHC repository. After previous discussion on this list, GHC has been
> gradually migrating towards having snapshots of libraries kept as
> tarballs in the repo (currently only "time" falls into this category),
> but I don't think we really evaluated the alternatives properly. Here's
> an attempt to do that, and to my mind the outcome is different: we
> really want to stick to having all libraries as separate repositories.
>
> Background:
> * Scope: libraries that are needed to build GHC itself (aka "boot
> libraries")
>
> * Boot libraries are of several kinds:
> - INDEPENDENT: Independently maintained (e.g. time, haskeline)
> - COUPLED: Tightly coupled to GHC, but used by others (base)
> - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH)
>
> * Most boot libraries are INDEPENDENT. INDEPENDENT libraries have a
> master repository somewhere separate from the GHC repositories.
>
> * We need a branch of INDEPENDENT libraries, so that GHC builds don't
> break when the upstream package is modified.
>
> * Sometimes we want to make local modifications to INDEPENDENT
> libraries:
> - when GHC adds a new warning, we need to fix instances of the
> warning in the library to keep the GHC build warning-free.
> - to check that the changes work, before pushing upstream
>
>
> Choices for how we deal with libraries in the GHC repository: (+) is a
> pro, (-) is a con.
>
> (1) Check out the library from a separate repo, using the darcs-all
> script. The repo may either be a GHC-specific branch
> [INDEPENDENT], or the master copy of the package
> [SPECIFIC/COUPLED].
>
> (+) we can treat every library this way, which gives a
> consistent story. Consistency is good for developers.
> (+) [INDEPENDENT] makes it easy to push changes upstream and sync
> with the upstream repo (unless upstream is using a different
> VCS).
>
> (-) [INDEPENDENT] we have to be careful not to let our branches
> get too far out of sync with upstream, and we must
> sync before releasing GHC.
>
> (2) Put a snapshot tarball of the library in libraries/tarballs,
> but allow you to checkout the darcs repo instead.
>
> (-) [SPECIFIC/COUPLED] this approach doesn't really make sense,
> because we expect to be modifying the library often.
> (-) updating the snapshot is awkward
> (-) workflow for making a change to the library is awkward:
> - checkout the darcs repo
> - make the change, validate it
> - push the change upstream (bump version?)
> - make a new snapshot tarball
> - commit the new snapshot to the GHC repo.
> (-) having tarballs in the repository is ugly
> (-) we have no revision history of the library
>
> (3) The GHC repo *itself* contains every library unpacked in the
> tree. You are allowed to check out the darcs repo instead.
>
> (+) atomic commits to both the library and GHC.
> (+) doing this consistently would allow us to remove darcs-all,
> giving a nice easy development workflow
>
> (-) [INDEPENDENT/COUPLED] still need a separate darcs repo.
> (-) [INDEPENDENT/COUPLED] pushing changes upstream is hard
> (-) [INDEPENDENT/COUPLED] manual syncing with upstream
> (-) [COUPLED] (particularly base) syncing with
> upstream would be painful.
>
>
> (3) works best for SPECIFIC libraries, whereas (1) works best for
> INDEPENDENT/COUPLED libraries. If we want to treat all libraries the
> same, then the only real option is (1).
>
> Experience with Cabal and bytestring has shown that (1) can work for
> INDPENDENT libraries, but only if we're careful not to get too
> out-of-sync (as we did with bytestring). In the case of Cabal, we never
> have local changes in our branch that aren't in Cabal HEAD, and that
> works well.
>
> Comments/thoughts?
>
> Cheers,
> Simon



More information about the Glasgow-haskell-users mailing list