[Haskell-cafe] cheap in-repo local branches (just needs
implementation)
Eric Kow
kowey at darcs.net
Tue Jul 21 17:23:12 EDT 2009
Hi everyone,
Max Battcher had an idea that I thought I should post on the mailing list.
The idea is about making branches in darcs. Right now, we take the view that a
darcs branch is a darcs repository plain and simple. If you want to create a
branch, all you have to do is darcs get (darcs get --lazy to be faster). While
this is very simple, a lot of us think that it's inconvenient (one because it's
slow, and two because you have to think of where to put the branch).
So darcs users have been asking about in-repo branches for a while. And now,
Max has come up with a way to implement them. What's nice about his approach
is that it lets us keep the simplicity of darcs, while giving more demanding
users a chance to work with branches. It also takes advantage of the Petr
Ročkai's Summer of Code project to make darcs faster in our daily lives and for
the matter, paves the way for a possible darcs plugin system in the future.
On Max's advice, I'm cross-posting to Haskell Cafe. Haskellers: here's a nice
chance for you get a cool Darcs feature without not very much effort or Darcs
hacking experience :-)
More info on: http://bugs.darcs.net/issue555
------------------------------------------------------------
Max's write-up
------------------------------------------------------------
Here's a quick primer: Basically, darcs >= 2.0 uses a hashed pristine
store that acts as a file object cache. An interesting artifact of the
pristine.hashed store, which is being pushed into a useful third-party
accessible library named hashed-storage, however, is that it does (for
many reasons, most co-evolutionary) resemble the git object store. There
are several differences, but one of the key differences that applies to
the topic at hand is that darcs generally garbage collects
pristine.hashed objects much faster than git.
Darcs is very quick to garbage collect old objects partly because many
aren't all that useful, but mostly because the primary representation
for a repository state is the patch store (and inventory), so there is
only one root pointer in the pristine store. Petr, the author of the
hashed-storage library, briefly discusses this in his most recent design
post about the future of hashed-storage:
http://mornfall.net/blog/designing_storage_for_darcs.html
Here's where the primer meets the topic at hand: A darcs branch consists
of three major components: an inventory store, a patch store, and a
pristine store. To store multiple branches "in the same place" you need
to take care of: 1) storing the alternate inventories, and 2) if you
want it to be relatively fast, storing additional objects in the
pristine store. (The patch store will already happily hold more patches
than are referenced in the current inventory.) (1) is mostly a matter of
naming alternate inventories and swapping between them. Thanks to the
*ahem* git-like nature of pristine.hashed/hashed-storage: darcs could
easily archive (many) more pristine objects, than it will during normal
operation, in pristine.hashed and it may be as simple as storing
additional, useful "root pointers" visible to hashed-storage so that it
knows not to garbage collect the objects from other branches.
Here's where the fun happens: It seems to me that a branch switching
tool, utilizing darcs' existing repository data stores, could be built
almost "purely" on top of mostly just the hashed-storage library (which
has been designed for reuse), as it exists today or hopefully with only
minor tweaking, and with only minimal interaction with darcs itself.
That is, in-repo branching could be provided entirely, today or soon, as
a second/third-party tool to darcs. (!)
I think this is great from a darcs perspective: darcs itself remains
conceptually simple (1 repository == 1 branch), which is something that
I for one love about darcs, and doesn't need additional commands in
darcs iteslf. But yet, power users (and git escapees) would have easy
access to a ``darcs-branch`` tool that provides simple and powerful
in-repo switching. Potentially, such a tool is also a great candidate to
be an earlier adopter for the darcs library support and can help better
define and enhance darcs' public API. (It's also interesting in that it
mirrors that hg's support for branches is an addon, and that both hg and
git have darcs-like patch queues as addons.)
I think this is even better from a hashed-storage perspective:
``darcs-branch`` would be a strong (new) use case for hashed-storage as
a public API. The tool would provide good incentive to keep
hashed-storage's API clean, and better incentive (than darcs' normal
operation) to keep hashed-storage's garbage collection and object
compaction strong. (With the 'cheap' cost of in-repo branches primarily
a consequence of how well hashed-storage stores the additional objects
of multiple branches. As a bonus, normal darcs operations should benefit
as well from the gc/compaction optimizations that darcs-branch
operations may make more obvious.)
At a high-level, a ``darcs-branch`` tool would provide core commands to:
1) Store the current repository state as a new branch by copying the
current inventory and inserting a new pristine root for the branch.
(``darcs-branch new`` or ``darcs-branch freeze``, perhaps)
2) Switch to a previously stored branch, by making the branch's
inventory the new current inventory and the branch's pristine root the
new current pristine root; updating the working directory as necessary.
(``darcs-branch switch``)
Additionally, there would be other useful management tools
(``darcs-branch list``, ``darcs-branch remove`` (or unfreeze)). I think
that these four commands could be done with no darcs interaction at all
(unless the branch being switched to has an incomplete/lazy pristine).
Useful commands that would need darcs interaction for patch management
would be things like ``darcs-branch push`` to push patches between named
branches (equivalent at a high level to ``darcs send -o new.dpatch
--context branchB.context; darcs-branch switch branchB; darcs apply
new.dpatch``), and ``darcs-branch obtain`` to obtain new in-repo local
branches from an existing context file, remote/external-local
repository, tag, or other matcher (that is, darcs get from one in-repo
branch to a new one).
I doubt that a ``darcs-branch get`` to download all of the branches
other than "current" (or HEAD, if you prefer, or "main" as I prefer) of
a remote repository would need any darcs interaction (downloading the
inventories and then many/most/all of the pristine objects). We can bet
that darcs' usual lazy patch-getting behavior should work out of the box
even for multiple branches.
Well, that's the general idea, at least. I believe that a willing
volunteer and a bit of help from Petr could build such a tool
"relatively quickly" and hopefully might even possibly work with today's
darcs as it is.
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090721/201353c1/attachment.bin
More information about the Haskell-Cafe
mailing list