[Haskell-community] Coordinating Hackage mirroring procedures

Michael Snoyman michael at snoyman.com
Fri Sep 16 07:50:43 UTC 2016

Hi all,

Duncan, Herbert and I have been having a conversation the past few days
about Hackage mirrors, kicked off by [1]. I requested we move the
discussion to a public list in case others have some thoughts on what we're
discussing. I'll do my best to summarize where we're at and what we're
thinking of doing. Duncan and Herbert: please jump in and correct any
mistakes :)

* Upstream Hackage is hosted on a private IP address which is pointed at by
a CDN. Currently, the connection between the CDN and Hackage is over
plaintext HTTP, though that might switch to HTTPS soon.
* Content on Hackage is secured via hackage-security, so users of
cabal-install 1.24 are downloading data securely from the CDN.
* There are three different Hackage mirroring tools available:
    * The original hackage-mirror tool in the Hackage repo itself, used
during the migration from Hackage 1 to Hackage 2. It is not
hackage-security aware
    * The hackage-mirror package on Hackage, which uploads to AWS S3. This
is what hackage.fpcomplete.com uses, and feeds into Stack. It is also not
hackage-security aware.
    * Herbert's hackage-mirror-tool, which uploads to Dreamhost S3, and
ostensibly supports AWS S3 as well (though not yet tested). It _is_
hackage-security aware.
* We don't really want three different tools, and we _do_ want the mirror
tools to be hackage-security aware, especially given the lack of SSL
between the CDN and Hackage itself.
* The CDN has the potential to serve out-of-sync files, in particular
deliver a 00-index.tar.gz file that refers to a certain package/version,
but will return a 404 for the package tarball. This can cause confusion for
downstream tools. The Dreamhost and AWS mirrors do not suffer from this.
* There are three additional Git repositories providing Hackage metadata in
different formats: all-cabal-files, all-cabal-hashes, and
all-cabal-metadata. These are used by Stack, Stackage, Nix, and stackage.org.
They feed off of the AWS S3 mirror.

I believe the course of action we have planned is:

1. Ensure that Herbert's hackage-mirror-tool works correctly with AWS S3.
2. Switch over the hackage.fpcomplete.com AWS S3 mirror to use Herbert's
hackage-mirror-tool, increasing its security, and allowing it to be listed
as an official Hackage mirror.
    * This mirror will still be operated by FP Complete, making it less
likely that a technical screw-up by one party will take down multiple
mirrors by distributing the administration of systems.
3. Continue pointing all-cabal-files, -hashes, and -metadata at the AWS S3
mirror, which provides high reliability and will now be more secure based
on the hackage-mirror-tool usage.

This will scratch a few different itches:

* A new mirror will be available on Hackage for cabal-install users
* The AWS S3 mirror will be more secure
* I can simplify some of my mirroring logic by not having to do workarounds
for out-of-sync CDNs

I think this is all doable, and pretty much within reach. I just wanted to
get this out in the public first in case others have input, perhaps based
on other use cases of these mirrors or repos that I'm not aware of.


[1] https://github.com/haskell/hackage-server/issues/537
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-community/attachments/20160916/fc5bd646/attachment.html>

More information about the Haskell-community mailing list