hackage-server: index format

Duncan Coutts duncan.coutts at googlemail.com
Sun May 29 18:13:49 CEST 2011


On Fri, 2010-11-19 at 11:16 -0600, Antoine Latter wrote:
> On Fri, Nov 19, 2010 at 7:01 AM, Duncan Coutts
> <duncan.coutts at googlemail.com> wrote:
> > On Fri, 2010-11-19 at 12:27 +0000, Duncan Coutts wrote:
> >
> >> Matt and I also discussed making the 00-index.tar.gz into a RESTful
> >> format by adding proper URLs for package tarballs.
> >
> > Indeed we could go further and use a single general format for
> > describing or distributing bundles of packages.

[..]

> > Opinions?
> >

I'd like to restart discussion on this topic. I think it'd be really
useful to have a single format worked out that covers all these cases.
Otherwise we'll end up with multiple special-case formats that are less
flexible overall.

> It feels like an abuse of tar-files to me - if we want to have a set
> of meta-data about the location of resources in a package repository,
> I think it would be better to come up with a file format that has the
> information we want directly and then serve it up.

The URLs in tar symlink entries is a bit of an abuse, but using tar as a
container format is perfectly reasonable (people do the same with zip
all the time). We already use tar, it is extensible and is a standard
format so has tools to help inspect or debug it.

> This hypothetical cabal-repository.description file would be pointed
> at by a user's .cabal/conf, and the config file would describe either
> what resources the repo makes available or how to discover what
> resources it makes available.

You mean the description file (not the ~/.cabal/config file) would
include or link to the resources that the repo makes available.

In that case we're talking about the same thing, the only issue is the
format of this package collection resource and what info it contains.

> So for a small repo, this file could contain a listing of package ids
> and where the tar-ball/package descriptions are.

I think that's also what I suggested (but using the tar format).

> We could even have a special case for local or file-share hosted
> repositories - the presence of an empty repo description file would
> imply that the contents of the repo is every tar, tar.gz or directory
> containing a .cabal file in the top level.

I'd rather not have a special case like that. We can make that use case
convenient with tools that add a package to a collection.

> A larger repository would point to another file which contains a
> collection of packages and their meta-data. One of the resources could
> be "here's where to find a tarball containing the package descriptions
> of every package I know how to serve" to support the current model of
> solving dependencies based. In this scenario the 'repo description'
> files would exactly be a REST description of the contents of Hackage
> Server.

Why the indirection via another file? I don't see why small vs large is
important here. We just point to the package collection / index either
as a local file or a URL.

> It's the same information as what you'd wanted to put in the index
> tarball, and we might even want to make it so that the repo config
> file can live in the tarball and address resources in the tarball it
> is hosted in (so I can deply a local cabal repo by dropping a tarball
> into a fileshare).

I'm not quite sure I follow. You're talking about a repo being a
fileshare with multiple files in a dir, or a single tarball with
everything in it?

Using a tarball format would indeed allow either, since the index can
link to package tarballs by reference (relative or absolute URL) or
include them by value.

> But slipstreaming metadata into soft-links in a tarball feels weird,
> and since we need client changes to make it work we may as well do it
> right.

If you don't like the symlink idea, just use blah.url files in the
tarball instead. They would contain the url as a single line of text.

Or instead of a symlink or an ordinary file, a special file entry (the
tar format has some file types reserved for user rather than system
purposes).

> Does this sort of approach sound sensible? I don't mind fleshing it
> out more as a start.

I'm not sure I really understand the difference. Whether there is a
difference in content/meaning or just a difference in the format.

Duncan




More information about the cabal-devel mailing list