Mirroring package uploader/upload date

Duncan Coutts duncan.coutts at googlemail.com
Tue Oct 18 21:17:47 CEST 2011

On Mon, 2011-10-17 at 10:09 +0100, Max Bolingbroke wrote:
> Fellow members of the shadowy Cabal,
> I've been looking at making the mirror client supply the original
> uploading user/upload time when it mirrors a package.
> I've developed an approach (attached) that, rather than PUTting a
> simple tarball to mirror, PUTs a combination of the tarball, user name
> and upload date in multipart/form-data format. This works (though the
> code is a bit grungy still). The only thing I'm worrying about before
> I tidy it up and commit is whether it is sufficiently RESTful: it
> feels a bit weird that the thing that you PUT is not what you get back
> from making a GET request to that URL.

Yeah, I had the same reaction when considering this previously.

> An alternative approach would be to expose resources for the
> uploader/upload time of a package, which the mirror client could then
> simply PUT to.

Yes. I think that is the right thing to do.

In principle it's hardly any more expensive (given pipelined http
requests) and it's a nicer design.

There's some other issues to consider. How does the mirror client pick
the user account and make sure it exists. This turns into the more
general problem of mapping user accounts between domains.

Sadly I don't think there is one policy that fits all circumstances. It
depends on what the user knows about the relationship between the
servers they're mirroring between. If no policy is given, probably a
reasonably default is to not set any uploader account and to just set
the upload time (probably that means we should have the uploader account
be Nothing rather than set to the mirroring client).

The new server identifies user accounts by id. User names are permitted
to change, but the userid remains the same (like unix accounts). It
exposes both the user id and name in the package index. I'm not sure if
we currently expose the set of users and names in some other useful way
(ie a single resource providing a  name <-> uid mapping).

The old server identifies users only by name. For mirroring from the old
server I think the sensible policy is:
      * take the username from the old server
      * look it up in the user db on the new server
      * if it exists, assume these are corresponding accounts
      * if it does not exist, create a new disabled user account with
        that name
      * set the chosen account as the package uploader

Another policy that would work for mirroring between new server
implementations is to assume the user ids match, or to make use of a
supplied uid mapping table.

If we don't find a corresponding account and don't want to use a policy
of creating disabled accounts then it's probably best to use no uploader
name, and just set the upload time.

Initially I think we only need the "null" policy of setting upload time
only, and the policy useful for a live public mirror of the central
hackage server.


More information about the cabal-devel mailing list