[Haskell-cafe] fptools in darcs now available

Thu Apr 28 09:25:50 EDT 2005

On Thu, Apr 28, 2005 at 10:01:05AM +0100, Simon Marlow wrote:
> On 28 April 2005 04:52, John Goerzen wrote:
> 
> Is it possible to set up a two-way synch so we can move over to darcs
> gradually?  It's not really practical for us to move over in one go,
> we've simply accumulated too many dependencies on CVS, and there are
> lots of people using the repo with CVS.  If we had a two-way synch, we
> can experiment with darcs non-destructively.

I believe so.  I haven't tried that myself yet, but I think tailor.py
supports it.

To do that though, we should really identify a permanent home for the
canonical fptools darcs repo.  I'm not really set up to provide accounts
for those that would need write access, and I don't want to be the
gatekeeper (I suspect nobody else wants that either <g>).

If cvs.haskell.org is up to the task, that would be an ideal location
IMHO.  I haven't yet convinced tailor.py to work with the pserver for
fptools, so if it can access the repo on the local filesystem, that
would be ideal.  Plus, one could then cron it to run frequently.

I'll volunteer to do the work to figure out how to do this and get it
installed if someone wants to install darcs 1.0.2 on that box and give
me a spot to plonk down the darcs repo.  It uses about 355M, including
the pristine and working trees.  _darcs itself is 240M.  And, urm,
20,000+ inodes will be needed to be safe :-)  (df -i will show those)
My brief look at cvs.h.o shows that /home has plenty of space free.

cvs.h.o has only 256M of RAM.  On a repo this size, darcs sometimes uses
more than that.  However, with the exception of periodic checkpointing,
I think we could avoid those RAM-intensive operations on the server
side.

The other benefit of having it on cvs.haskell.org is that it can be
cronned to run fairly frequently (say, every 15 minutes) to help
minimize the possibility of conflicts.

> Off the top of my head, a few other things we need before we can even
> think of switching over:
> 
>   - split up the full fptools repo into pieces (as we discussed on
>     #haskell).

I gave that some thought before I started.  Several things occured to
me:

 * There are quite a few commits in fptools that modify multiple
   projects (more than anyone estimated at first, I think)

 * The conversion process took a long time, so it may be best to convert
   it all at once and then split it up (~1 week * n projects == more
   time than I have to invest)

 * There were several "great renames" in the tree.  Tracking the entire
   history of an invidual project across those would be difficult
   at best.

Now, having said that, I did keep the request in mind.  I figure that
this big repo can be split up into smaller ones at any time after the
CVS mirroring is stopped.  For each smaller one, the process would be:

1. Branch (darcs get) the master repo

2. Delete all the files that don't apply to the smaller project

3. Rename the smaller project's files as appropriate

4. Checkpoint here

Because darcs get hardlinks patches, this wouldn't be as costly to disk
as it might seem, and still preserves the history of the smaller
project.

I'm about to write a new mail to darcs-users about my observations of
darcs' performance on this repo.  The summary is that the day-to-day
operations are still fairly fast and my ext2 server is holding up far
better than I expected.

>   - a web interface to the repo

If you mean darcs.cgi, that should be trivial enough to set up on
whatever the permanent host is.  I don't run it on my server because I
am very resource-limited there

>   - commit mails to the cvs-<blah>@haskell.org lists

I figure a cron job can do this.  Every x minutes, run darcs changes -s,
and send copies of never-before-seen logs to the list.  Should be fairly
trivial.  I can do this too.  (But only after the CVS gateway is
disabled; otherwise, you'd get two copies for every commit)

Let me know your thoughts.

-- John