Libraries in the repo

Wed Aug 26 19:55:01 EDT 2009

marlowsd:
> Simon and I have been chatting about how we accommodate libraries in the  
> GHC repository.  After previous discussion on this list, GHC has been  
> gradually migrating towards having snapshots of libraries kept as  
> tarballs in the repo (currently only "time" falls into this category),  
> but I don't think we really evaluated the alternatives properly.  Here's  
> an attempt to do that, and to my mind the outcome is different: we  
> really want to stick to having all libraries as separate repositories.
>
> Background:
>  * Scope: libraries that are needed to build GHC itself (aka "boot
>    libraries")
>
>  * Boot libraries are of several kinds:
>    - INDEPENDENT: Independently maintained (e.g. time, haskeline)
>    - COUPLED: Tightly coupled to GHC, but used by others (base)
>    - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH)
>
>  * Most boot libraries are INDEPENDENT.  INDEPENDENT libraries have a
>    master repository somewhere separate from the GHC repositories.
>
>  * We need a branch of INDEPENDENT libraries, so that GHC builds don't
>    break when the upstream package is modified.
>
>  * Sometimes we want to make local modifications to INDEPENDENT
>    libraries:
>      - when GHC adds a new warning, we need to fix instances of the
>        warning in the library to keep the GHC build warning-free.
>      - to check that the changes work, before pushing upstream
>
>
> Choices for how we deal with libraries in the GHC repository: (+) is a
> pro, (-) is a con.
>
>   (1) Check out the library from a separate repo, using the darcs-all
>       script.  The repo may either be a GHC-specific branch
>       [INDEPENDENT], or the master copy of the package
>       [SPECIFIC/COUPLED].
>
>       (+) we can treat every library this way, which gives a
>           consistent story.  Consistency is good for developers.
>       (+) [INDEPENDENT] makes it easy to push changes upstream and sync
>           with the upstream repo (unless upstream is using a different
>           VCS).
>
>       (-) [INDEPENDENT] we have to be careful not to let our branches
>           get too far out of sync with upstream, and we must
>           sync before releasing GHC.
>
>   (2) Put a snapshot tarball of the library in libraries/tarballs,
>       but allow you to checkout the darcs repo instead.
>
>       (-) [SPECIFIC/COUPLED] this approach doesn't really make sense,
>           because we expect to be modifying the library often.
>       (-) updating the snapshot is awkward
>       (-) workflow for making a change to the library is awkward:
>           - checkout the darcs repo
>           - make the change, validate it
>           - push the change upstream (bump version?)
>           - make a new snapshot tarball
>           - commit the new snapshot to the GHC repo.
>       (-) having tarballs in the repository is ugly
>       (-) we have no revision history of the library
>
>   (3) The GHC repo *itself* contains every library unpacked in the
>       tree.  You are allowed to check out the darcs repo instead.
>
>       (+) atomic commits to both the library and GHC.
>       (+) doing this consistently would allow us to remove darcs-all,
>           giving a nice easy development workflow
>
>       (-) [INDEPENDENT/COUPLED] still need a separate darcs repo.
>       (-) [INDEPENDENT/COUPLED] pushing changes upstream is hard
>       (-) [INDEPENDENT/COUPLED] manual syncing with upstream
>       (-) [COUPLED] (particularly base) syncing with
>           upstream would be painful.
>
>
> (3) works best for SPECIFIC libraries, whereas (1) works best for
> INDEPENDENT/COUPLED libraries.  If we want to treat all libraries the
> same, then the only real option is (1).
>
> Experience with Cabal and bytestring has shown that (1) can work for
> INDPENDENT libraries, but only if we're careful not to get too
> out-of-sync (as we did with bytestring).  In the case of Cabal, we never  
> have local changes in our branch that aren't in Cabal HEAD, and that  
> works well.
>
> Comments/thoughts?

As author of bytestring, I'd prefer it if GHC used a released version
direct from Hackage. I.e. GHC could snapshot a Hackage release, and get
out of the business of cloning repos. Same for other INDPENDENTs.

-- Don