hackage-server: index format

Duncan Coutts duncan.coutts at googlemail.com
Fri Nov 19 07:27:35 EST 2010


On Thu, 2010-11-18 at 19:46 -0600, Antoine Latter wrote:
> Hi folks,
> 
> The index tar-ball on Hackage has an odd naming convention. Package
> descriptions are given paths of the form:
> 
> ./$pkg/$version/$pkg.cabal
> 
> including the leading "./".
> I'm guessing that this is done as a method of distinguishing
> non-package meta-data.
> 
> Is this a convention we need to preserve?

The .cabal extension is essential. Tools are required to ignore file
extensions they do not understand. This provides a bit of forwards
compatibility.

In theory the file path should not be significant. However the current
cabal-install code does rely on the name and version directories. It
uses this to find the package id without having to parse the .cabal
file. This is bad and fragile.

But basically you cannot change that layout for the moment.

I would like to move to a model where the file name may be meaningful
but the path is not significant. I would also like to make the proper
way to find the package id be to parse the file. I'd like to change
cabal-install so that it generates it's own fast cache on each "cabal
update", rather than reading the index.tar every time. This would mean
we could pay the expense of parsing all the .cabal files and thus could
do it properly.

Matt and I also discussed making the 00-index.tar.gz into a RESTful
format by adding proper URLs for package tarballs. Currently clients
have to know the URL structure of the server: given a package Id taken
from the index they construct a URL $root/pkg-ver/pkg-ver.tar.gz. As we
all know, forcing clients to construct URLs is bad (inflexible etc etc).

To extend the format to contain URLs we were thinking of making use of
the tar format's support for symlinks. The symlink content can be
interpreted as a URL, either relative or absolute, e.g.:

foo-1.0.tar.gz  ->  /package/foo-1.0/foo-1.0.tar.gz
or
foo-1.0.tar.gz  ->
http://hackage.haskell.org/package/foo-1.0/foo-1.0.tar.gz

That is, the index contains a bunch of cabal files, and also a bunch
of .tar.gz symlinks. Like URLs in html these are interpreted relative to
the URL of the index.tar.gz itself. So if we got the index.tar.gz from
say:

http://hackage.haskell.org/index.tar.gz
then a relative URL like /package/foo-1.0/foo-1.0.tar.gz is interpreted
as
http://hackage.haskell.org/package/foo-1.0/foo-1.0.tar.gz

This is totally standard URL convention, only odd thing is using tarball
symlinks as URLs, though it seems like a pretty natural generalisation.
It works fine if you unpack the tarball with ordinary tar programs, it
just makes broken symlinks.

So note that the name of the tarball entry "foo-1.0.tar.gz" is
significant, beyond the fact of the extension. The name "foo-1.0" is
significant as it is the key in the package Id -> url mapping.

Duncan



More information about the cabal-devel mailing list