Request for feedback on spec/proposal for distributing package collections via hackage

Tue Jul 14 12:52:46 UTC 2015

Hi folks,

I'd like to get feedback on a spec/proposal for distributing package
collections via hackage. This is currently somewhere beyond vapourware
but certainly not a fait accompli and hopefully it is at an appropriate
point to get feedback.

The basic idea is that package collections are:
      * useful (IMHO, one of the top two solutions to dependency hell,
        alongside nix-style package management); and
      * just as we distribute packages via hackage, we should also be
        able to easily distribute package collections.

One would then use them with tools like cabal and stack. Distributing
via hackage (both in the sense of the format/protocol and in the sense
of the central community hackage instance) seems natural, and allows
taking advantage of much of the infrastructure we have for packages
already like:
      * existing user accounts and management infrastructure on the
        hackage website
      * allowing anyone to host collections on their own servers, just
        as they can host their own package archives currently (either as
        static file sets or with smart servers)
      * low barrier for distribution, potentially encouraging more
        collections to be created potentially covering more use cases
      * security infrastructure (currently in alpha)
      * automatic mirroring (currently in alpha)

Two obvious examples are stackage-lts and stackage-nightly but if we
lower the barrier for distribution then there may well be many more. For
example, the existing Linux distros put a lot of effort into selecting
and maintain package collections, and some of these collections could be
distributed via hackage. In fast several Linux distributions already use
Hackage's "distro" feature to advertise which versions of packages are
provided by that distro. One can also imagine special-purpose
collections, and there's probably cases we've not thought of yet.

Package collections are different things from packages, not like "meta
packages" that one gets in some package systems. A package collection at
it's simplest is just a set of source package identifiers (ie
names-version pairs). Like packages, package collections have names and
versions and are immutable once distributed.

The intention is that users can configure their tool to use
collection(s), either by nailing down a specific collection version, or
by not specifying a version it would default to the latest version of
the named collection. (But the specific behaviour is up to the tool)

Use cases:

      * versioned collections. For some collections the policy by which
        it's defined naturally uses meaningful versions.
      * daily collections. These can have a date-form version number
        imposed on them.
      * "live" "rolling" collections. These could have a simple
        monotonic increasing version with no particular meaning
        attached. For such collections, clients might be configured to
        use the latest (by not specifying a version), but it's always
        possible to pick a specific revision.
      * special-purpose collections. Not necessarily collections aiming
        to cover a large number of common packages, but aiming to cover
        some application area, or related stack of packages (e.g. some
        of the web frameworks).
      * negative collections. Collections of packages you may
        specifically want to avoid (e.g. deprecated by their authors, or
        known-broken). Using such collections would rely on clients that
        can be configured to treat it negatively.

Specifics:

A package collection specifies a set of source package ids (id being
name-version pair). It also optionally specifies a (partial) flag
assignment for any package name.

The collection does not specify how tools should treat them. That is, a
collection does not specify if it should be treated as a strong or a
soft constraint, inclusive or exclusive, positive or negative. Such
things are completely up to the client's policy and configuration.
Similarly for flag assignments, collections do not specify whether tools
should interpret these as strong or soft constraints.

Syntax:

Package collection names and versions exactly follow those of package
names (but they live in a different namespace). For example,
"stackage-lts-2.9", or "deprecated-343" (the latter being a "rolling"
collection with a meaningless monotonically increasing version).

A collection distributed in the archive format is just a text file with
one entry per line, such as:

        foo-1.0
        foo-1.1
        bar >= 3 && < 4
        bar +this -that

So each line can be one of:
      * a simple package id
      * a package version range, using Cabal version range syntax
      * a package name with a flag assignment, + for on, - for off

The interpretation of the above is that:
      * both foo-1.0 and foo-1.1 are in the collection (ie union not
        intersection)
      * all versions of bar between 3 and 4 are in the collection
      * the package bar has flag 'this' as True, and flag 'that' as
        False

Of course for some collections the policy is that only one version of
any package is included, but this is a policy question and the format
itself does not impose this constraint.

Hackage archive format:

collection files live under a different prefix from package tarballs
(but are still considered part of the archive) and are named after the
collection id. The collection files are not compressed (but of course
http clients and servers can negotiate transport compression). The
collection files are not included in nor listed in the existing
00-index.tar.gz, but there's other json format metadata for a client to
enumerate the available collections and versions. And like with package
tarballs, a client that wants a specific collection version can
construct the url and fetch it directly.

Security:

The hackage security system that's currently in alpha testing can easily
be extended to cover collections, similarly to how it covers package
tarballs.

Misc notes:

There is no requirement that a hackage-format repo containing
collections be closed. That is, the collections may refer to packages
not in that archive. This could be useful for private hackage repos that
host a small number of private packages, but also host collections that
refer both to the private packages and public ones from the community
central hackage. The resolution of package names is done by the clients,
and some clients may be configured to union/overlay multiple repos.

On the other hand, for the central community hackage it may be sensible
to enforce a policy that the collections it distributes be closed (ie
refer only to packages distributed via hackage).

Questions:

Is this sufficiently flexible to fully cover the obvious use cases? Are
there any interesting use cases that are excluded?

Anything else?

Duncan