[Haskell-cafe] cheap in-repo local branches (just needs implementation)

Tue Jul 21 17:23:12 EDT 2009

Hi everyone,

Max Battcher had an idea that I thought I should post on the mailing list.

The idea is about making branches in darcs.  Right now, we take the view that a
darcs branch is a darcs repository plain and simple.  If you want to create a
branch, all you have to do is darcs get (darcs get --lazy to be faster).  While
this is very simple, a lot of us think that it's inconvenient (one because it's
slow, and two because you have to think of where to put the branch).

So darcs users have been asking about in-repo branches for a while.  And now,
Max has come up with a way to implement them.  What's nice about his approach
is that it lets us keep the simplicity of darcs, while giving more demanding
users a chance to work with branches.  It also takes advantage of the Petr
Ročkai's Summer of Code project to make darcs faster in our daily lives and for
the matter, paves the way for a possible darcs plugin system in the future.

On Max's advice, I'm cross-posting to Haskell Cafe.  Haskellers: here's a nice
chance for you get a cool Darcs feature without not very much effort or Darcs
hacking experience :-)

More info on: http://bugs.darcs.net/issue555

------------------------------------------------------------
Max's write-up
------------------------------------------------------------

Here's a quick primer: Basically, darcs >= 2.0 uses a hashed pristine 
store that acts as a file object cache. An interesting artifact of the 
pristine.hashed store, which is being pushed into a useful third-party 
accessible library named hashed-storage, however, is that it does (for 
many reasons, most co-evolutionary) resemble the git object store. There 
are several differences, but one of the key differences that applies to 
the topic at hand is that darcs generally garbage collects 
pristine.hashed objects much faster than git.

Darcs is very quick to garbage collect old objects partly because many 
aren't all that useful, but mostly because the primary representation 
for a repository state is the patch store (and inventory), so there is 
only one root pointer in the pristine store. Petr, the author of the 
hashed-storage library, briefly discusses this in his most recent design 
post about the future of hashed-storage:

http://mornfall.net/blog/designing_storage_for_darcs.html

Here's where the primer meets the topic at hand: A darcs branch consists 
of three major components: an inventory store, a patch store, and a 
pristine store. To store multiple branches "in the same place" you need 
to take care of: 1) storing the alternate inventories, and 2) if you 
want it to be relatively fast, storing additional objects in the 
pristine store. (The patch store will already happily hold more patches 
than are referenced in the current inventory.) (1) is mostly a matter of 
naming alternate inventories and swapping between them. Thanks to the 
*ahem* git-like nature of pristine.hashed/hashed-storage: darcs could 
easily archive (many) more pristine objects, than it will during normal 
operation, in pristine.hashed and it may be as simple as storing 
additional, useful "root pointers" visible to hashed-storage so that it 
knows not to garbage collect the objects from other branches.

Here's where the fun happens: It seems to me that a branch switching 
tool, utilizing darcs' existing repository data stores, could be built 
almost "purely" on top of mostly just the hashed-storage library (which 
has been designed for reuse), as it exists today or hopefully with only 
minor tweaking, and with only minimal interaction with darcs itself. 
That is, in-repo branching could be provided entirely, today or soon, as 
a second/third-party tool to darcs. (!)

I think this is great from a darcs perspective: darcs itself remains 
conceptually simple (1 repository == 1 branch), which is something that 
I for one love about darcs, and doesn't need additional commands in 
darcs iteslf. But yet, power users (and git escapees) would have easy 
access to a ``darcs-branch`` tool that provides simple and powerful 
in-repo switching. Potentially, such a tool is also a great candidate to 
be an earlier adopter for the darcs library support and can help better 
define and enhance darcs' public API. (It's also interesting in that it 
mirrors that hg's support for branches is an addon, and that both hg and 
git have darcs-like patch queues as addons.)

I think this is even better from a hashed-storage perspective: 
``darcs-branch`` would be a strong (new) use case for hashed-storage as 
a public API. The tool would provide good incentive to keep 
hashed-storage's API clean, and better incentive (than darcs' normal 
operation) to keep hashed-storage's garbage collection and object 
compaction strong. (With the 'cheap' cost of in-repo branches primarily 
a consequence of how well hashed-storage stores the additional objects 
of multiple branches. As a bonus, normal darcs operations should benefit 
as well from the gc/compaction optimizations that darcs-branch 
operations may make more obvious.)

At a high-level, a ``darcs-branch`` tool would provide core commands to:

1) Store the current repository state as a new branch by copying the 
current inventory and inserting a new pristine root for the branch. 
(``darcs-branch new`` or ``darcs-branch freeze``, perhaps)

2) Switch to a previously stored branch, by making the branch's 
inventory the new current inventory and the branch's pristine root the 
new current pristine root; updating the working directory as necessary. 
(``darcs-branch switch``)

Additionally, there would be other useful management tools 
(``darcs-branch list``, ``darcs-branch remove`` (or unfreeze)). I think 
that these four commands could be done with no darcs interaction at all 
(unless the branch being switched to has an incomplete/lazy pristine).

Useful commands that would need darcs interaction for patch management 
would be things like ``darcs-branch push`` to push patches between named 
branches (equivalent at a high level to ``darcs send -o new.dpatch 
--context branchB.context; darcs-branch switch branchB; darcs apply 
new.dpatch``), and ``darcs-branch obtain`` to obtain new in-repo local 
branches from an existing context file, remote/external-local 
repository, tag, or other matcher (that is, darcs get from one in-repo 
branch to a new one).

I doubt that a ``darcs-branch get`` to download all of the branches 
other than "current" (or HEAD, if you prefer, or "main" as I prefer) of 
a remote repository would need any darcs interaction (downloading the 
inventories and then many/most/all of the pristine objects). We can bet 
that darcs' usual lazy patch-getting behavior should work out of the box 
even for multiple branches.

Well, that's the general idea, at least. I believe that a willing 
volunteer and a bit of help from Petr could build such a tool 
"relatively quickly" and hopefully might even possibly work with today's 
darcs as it is.

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090721/201353c1/attachment.bin