More on version management...

Fri Mar 19 13:21:48 EST 2004

Graham Klyne <gk at ninebynine.org> writes:

> I took a look at the Subversion site [1], and see three features which
> appear to be quite compelling.

Subversion fixes several of CVS's major problems, that's for sure.  It
doesn't really add any new abilities, however, for a distributed
community like the Haskell community.  (At this point, I have spent a
few years using CVS, a few months with SVN, couple months with arch,
and a few days with darcs, along with a summer where we used Clearcase
at Cisco Systems, so I've had a decent survey of what's out there.)

When I was using svn, I was very pleased that I could run diff on my
laptop without having access to the Internet, but that's about the
only thing that svn does that helps it to be useful for a distributed
community.

The Haskell community has been talking a lot lately about how we want
to be less centralized, especially with regard to library development.
There are others, but let me explain my most compelling use case for a
distributed system like darcs or arch.

1) Joe Hacker discovers Haskell, becomes a 1337 developer, and is
   ready to try extending a large project over a longish period of
   time

2) But no one knows Joe Hacker, so there's no way we're going to give
   him write access to a CVS (or SVN) repository for a mission
   critical tool and invite him to create his own branch.  Maybe he's
   too shy to ask, because he's not convinced that he can really
   implement the feature he wants to add.

3) So Joe Hacker grabs the latest CVS branch and starts hacking away.
   Pretty soon, he needs version control, since he's a 1337 hacker, he
   knows how important it is.

4) So he imports his branch into his own local version of CVS.  Now we
   have two completely different repositories who won't ever share log
   information.

5) Meanwhile, the mission critical tool he's hacking on is diverging
   from his code.  He has to merge every so often.  How does he do
   this?  He remember the date he last synched, gets a diff between
   the synch date and the current version, patches his new version,
   resolves conflicts, and he's all set.

6) Oh, and if he's backporting his features to the stable version,
   he'd better get a diff of his own work and patch the old version as
   well.  If upstream has added patches to the old version too, well
   he has to get a diff with that version and apply that patch.

7) Next time he wants to backport his patches, he's in big trouble.
   His "unstable" branch is polluted by upstream's changes, and
   there's no way to extract his own changes without browsing through
   diff hunks.  Oops, looks like he should have used 3 branches or
   something.  There are probably tools to make this easier with diff
   & patch.

8) If his work is accepted in the end, merge with mainline.

This is the best he can do; if he forgets to remember his sync points
(because he wasn't planning ahead), it's going to be harder.  This
demonstrates a very important point.  CVS has been around for so long,
and hasn't been replaced by something better, because all of its
problems can be worked around (and also because it's a great tool).
But that doesn't mean that it's the perfect tool for every product.

But with a distributed version control system like darcs or arch, this
becomes much easier.  Now, Jane Hacker can follow this procedure
(whose numbering scheme is not relevant to the above):

1) She discovers Haskell

2) She uses "darcs get" to grab the latest tree.  No extra import step
   is required, she just uses "darcs record" to record her changes.

3) As the mission critical tool diverges, she uses "darcs pull" to get
   the latest changes.  Darcs prompts her for each patch, or she can
   use the --all flag.

4) To backport to the stable version, she uses "darcs pull" against
   her own version.  She can do this before or after syncing with
   upstream.  Darcs will prompt her with each patch and ask her if she
   wants to apply them.  She says "no" to the upstream patches.  She
   also can use three branches to make this easier.

(All this is true of arch as well, but darcs has the great advantage
of being written in Haskell, and actually a bit easier to use.  I'm
sure there are other advantages too, I just haven't used it that
long.)

I can tell you from experience that this has happened to me before.
I'm sure it happens all the time when a hacker joins a new community.
When Colin Walters and I were hacking on Debian's APT tool, we
couldn't get access to their repository.  In my hmake hacking, I
didn't ask for access to the repository because I'm not 100% convinced
that hmake is the right choice for what I'm trying to do.  I
nevertheless need to use version control in the mean time.

There are other compelling use-cases as well.  For instance, in
packaging darcs for Debian, it's very convenient to be able to
cherry-pick changes from the main branch.

By the way, I feel that the cost of learning a new way of doing
version control is over-hyped.  For one thing, you can use arch just
like CVS and still give your users the distributed advantages.  For
another thing, it's just not that hard to learn.  My company hired a
new guy the other day, and he read the arch tutorial one morning and
that was all he needed.

So anyway, as others have pointed out, no one is suggesting that we
move everything over to an experimental VC system, even in the medium
term, especially something as large and mission-critical as fptools.
Having a project like the Library Infrastructure Project in a
distributed VC system will be a good experiment, though.  Someone is
working on a CVS <--> darcs bridge for this, so perhaps we can make
everyone happy.

peace,

isaac