hackage-server: index format

Thu Dec 20 17:54:13 CET 2012

Hello,

I have recently been reading the source code of Cabal. I found
the index command and I found this thread. However, when I run a
recent Cabal, it seems the index command is not available:

 :; /Library/Haskell/ghc-7.4.2/lib/cabal-install-1.16.0.2/bin/cabal index
  cabal: unrecognised command: index (try --help)

I am curious as to what the present level of support is for
local repositories, in the form of a directory full of sdist
tarballs.

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B

2011/5/29 Duncan Coutts <duncan.coutts at googlemail.com>:
> On 29 May 2011 19:46, Antoine Latter <aslatter at gmail.com> wrote:
>> On Sun, May 29, 2011 at 11:13 AM, Duncan Coutts
>> <duncan.coutts at googlemail.com> wrote:
>>> On Fri, 2010-11-19 at 11:16 -0600, Antoine Latter wrote:
>>
>>>
>>> I'm not sure I really understand the difference. Whether there is a
>>> difference in content/meaning or just a difference in the format.
>>>
>>
>> Oh my, what an old thread. I'll try an resurrect my state of mind at the time.
>
> Sorry :-)
>
>> I think my main concern was, as you said, a difference in format not a
>> difference in substance. I also might have thrown in a good amount of
>> over-engineering as well.
>>
>> What it comes down to is that embedding relative URLs (or even
>> absolute URLs) in a tar-file feels like an odd thing to do - I don't
>> see what advantage is has over a flat text file, and I can no longer
>> create/consume the tar-file with standard tools.
>>
>> But maybe this doesn't matter - can we re-state what goals we're
>> trying to get to, and what problems we're trying to solve? Going back
>> into this thread I'm not even sure what I was talking about.
>
> Hah! :-)
>
> I'll restate my thoughts.
>
>> Are we trying to come up with a master plan of allowing cabal-install
>> to interact with diverse sources of packages-which-may-be-installed
>> data?
>
> Yes.
>
>> I'm imagining the following use cases:
>>
>> 1. hackage.haskell.org
>> 2. a network share/file system path with a collection of packages
>> 3. an internet url with a collection of packages
>
> Yes.
>
>> 4. an internet url for a single package
>
> That we can do now, because it's a single package rather than really a
> collection.
>
> cabal install http://example.com/~me/foo-1.0.tar.gz
>
>> 5. a tarball with a collection of packages
>
> Yes, distributing a whole bunch of packages in a single file.
>
>> 6. a tarball with a single package
>
> We can also do that now:
>
> cabal install ./foo-1.0.tar.gz
>
>> 7. an untarred folder containing a package (as in 'cabal install' in
>> my dev directory)
>
> Yes.
>
>> With the ability to specify some of these in the .cabal/config or at
>> the command line as appropriate. There's going to be some overlap
>> between these cases, almost certainly.
>
> Yes. The policy is up for grabs, the important point here is mechanism
> and format.
>
>> Am I missing any important cases? Are any of these cases unimportant?
>
> Another impotant use case: the "cabal-dev" use case, a local unpacked
> package with a bunch of other local source packages, either local
> dirs, local or remote tarballs. This is basically when you want a
> special local package environment for this specific package.
>
> A closely related and overlapping use case is having a project that
> consists of multiple packages, e.g. gtk2hs consists of
> gtk2hs-buildtools, glib, cairo, pango and gtk. Devs hacking on this
> want to build them all in one batch. Technically you can do this now,
> but it's not convenient. I'd have to say:
>
> gtk2hs$ cabal install gtk2hs-buildtools/ glib/ cairo/ pango/ gtk/
>
> What we want there is a simple index that contains them all and that
> cabal-install then uses by default when we build/install in this
> directory. Or something like that.
>
>> The next question would be how much effort do we require of the
>> provider of a specific case? So for numbers 4 & 5, is the output of
>> 'cabal sdist' good enough? For numbers 2 & 3, will I be able to just
>> place package tgz files into a particular folder structure, or will I
>> need to produce an index file?
>
> For the single package cases, yes we don't need an index and we can
> already do these cases.
>
> My though about the UI is that we always have an index, so no pure
> directory collections. I'd add a "cabal index" command with
> subcommands for adding, removing and listing the collection. There
> would be some options when you add to choose the kind of entry.
>
>> What are other folks doing? I don't know much about ruby gems.
>> Microsoft's new 'NuGet' packages supports tossing packages in a
>> directory and then telling Visual Studio to look there (they also
>> support pointing the tools at an ATOM feed, which was interesting).
>
> Ah that's interesting. I've also been thinking about incremental
> updates of the hackage index. I think we can do this with a tar based
> format.
>
> We're not precluding cabal-install supporting a pure directory style,
> but having a specific collection resource is necessary in most use
> cases, particularly the http remote cases. If we get the UI right then
> we probably don't need the pure directory style since it'd just be a
> matter of "cp" vs "cabal index add".
>
> Ok, you've mostly covered it, but to try and present it all in one go,
> here's what I think we need:
>
> We need a way to describe collections of Cabal packages. These
> collections should either link to packages or include them by value.
> Optionally, for improved performance the .cabal file for packages can
> be included. The format should be usable in a REST context, that is it
> should support locating packages via a URL.
>
> For each package in the index we need:
>  * A link to the package (either tarball or local directory)
>    OR: the package tarball by value (rather than a link)
>  * optionally a .cabal file for the package
>
> We need a format that has forwards compatability so that in future we
> allow other optional attributes/metadata for the package, e.g. digital
> signatures, or other collection-global information.
>
> Using proper URLs (absolute or relative to the location of the
> collection itself) gives a good deal of flexibility. The current
> hackage archive format has implicit links which means the layout of
> the archive is fixed and it requires that all the packages are
> provided directly on the same http server. Using URLs allows a
> flexible archive layout and allows "shallow" or "mirror" archives that
> redirect to other servers for all or some packages.
>
> In addition to hackage/archive-style use cases, the other major use
> case is on local machines to create special source package
> environments. This is just a mapping of source package id to its
> implementation as a source package. This is useful for multi-package
> projects, or building some package with special local versions of
> dependencies. The key distinguishing feature of these package
> environments is that they are local to some project directory rather
> than registered globally in the ~/.cabal/config.
>
> The motivation for including package tarballs by value is that it
> allows distributing multi-package systems/projects as a single file,
> or as a convenient way of making snapshots of packages without having
> to stash them specially in some local directory.
>
> My suggestion to get this kind of flexible format is to reuse and
> abuse the tar format. The tar format is a collection of files. We can
> encode different kinds of entries into file extensions.
>
> To encode URL links my idea was to abuse the tar symlink support and
> say that symlinks are really just URLs. Relative links are already
> URLs, the abuse is to suggest allowing absolute URLs also, like
> http://example.com/~me/foo-1.0.tar.gz. The advantage of this approach
> is that each kind of entry (tarball .cabal file etc) can be either
> included by value as a file or included as a link. If we have to
> encode links as .url files then we lose that ability.
>
> Instead of using symlinks it is also possible to add new tar entry
> types. Standard tools will either ignore custom types on unpacking or
> treat them as ordinary files. Standard tools will obviously not create
> custom tar entries, though they will add symlinks.
>
> Here is an example convention for names and meanings of tar entries
>
>  1. foo-1.0.tar.gz
>  2. foo-1.0.cabal
>  3. foo-1.0
>
> 1 & 2 can be a file entry or they can be a symlink/url, while 3 can
> only be a symlink. For example:
>
>  * foo-1.0.tar.gz -> packages/foo/1.0/foo-1.0.tar.gz
>  * foo-1.0.tar.gz -> htpp://code.haskell.org/~me/foo-1.0.tar.gz
>  * foo-1.0 -> foo-1.0/
>  * foo-1.0 -> ../deps/foo-1.0/
>
> The links are interpreted as ordinary URLs, possibly relative to the
> location of the collection itself. For example if we got this
> index.tar.gz from http://hackage.haskell.org/index.tar.gz then the
> link packages/foo-1.0.tar.gz gives us
> http://hackage.haskell.org/packages/foo-1.0.tar.gz
>
> Links to directories are only valid for local cases because we do not
> support remote unpacked packages (because there's no reasonable way to
> enumerate the contents).
>
> For these relative URLs one can use standard tar tools to construct
> the index. For absolute URLs it is in fact still possible by making
> broken symlinks that point to non-existent files like:
>
> $ ln -s htpp://code.haskell.org/~me/foo-1.0.tar.gz foo-1.0.tar.gz
>
> and the tar tool will happily include such broken symlinks into the tar file.
>
> We could instead use a custom tar entry type for URLs but we would
> lose this ability.
>
> For a user interface I was thinking of something along the lines of:
>
> cabal index init [indexfile]
> cabal index add [indexfile] [--copy] [--link] [targets]
> cabal index list [indexfile]
> cabal index remove [indexfile] [pkgname]
>
> The --copy and --link flags for index add are to distinguish between
> adding a snapshot copy of a tarball to the index or linking to the
> local tarball which may be updated later. We may also want to
> distinguish between a volatile local tarball and a stable one. In the
> latter case we can include a cached copy of the .cabal file. I'm not
> sure if there's a sensible default for --copy vs --link or whether we
> should force people to choose like "cabal index add-copy vs add-link".
>
> Duncan
>
> _______________________________________________
> cabal-devel mailing list
> cabal-devel at haskell.org
> http://www.haskell.org/mailman/listinfo/cabal-devel