[Haskell-cafe] Improvements to package hosting and security

Mathieu Boespflug mboes at tweag.net
Mon May 4 08:42:05 UTC 2015


It takes apples, oranges, pears and artichokes and then some to stay
healthy and keep the doctor away. That's why Git, in any capacity,
isn't a silver bullet to solve all problems. However, let's break down
how the envisioned setup helps with the problems I mentioned:

- cabal-install mysteriously dropping HTTP connections and corrupting
.cabal files: this particular firewall that I've seen is used by
hundreds of developers in the company without it silently truncating
requests on anything else but Cabal updates. Investigations so far
point to a bad interaction between Network.HTTP and lazy bytestrings,
see http://www.hawaga.org.uk/tmp/bug-cabal-zlib-http-lazy.html (no bug
report just yet). Reusing the same download mechanism that hundreds of
others are already using in the company means we are not at risk of a
firewall triggering an obscure latent race condition in the way
cabal-install retrieves HTTP responses. It means if there is a real
problem with the firewall, it won't just be for the local Haskellian
outpost who are trying to sell Haskell to their boss, but for
everyone, and therefore fixed.

- the reversing revisions issue was NOT just a display issue: it
completely broke Stackage Nightly builds that day, which just calls
`cabal update` under the hood:
https://github.com/haskell/hackage-server/issues/305. Other users of
Hackage in that time window also experienced the issue. It's an issue
that caused massive breakage in a lot of places. Notice how PkgInfo_v2
is a data structure that is entirely redundant with what Git would
provide already, so need not be serialized to disk, have migrations
written for it, etc, nor perhaps exist at all. Further, Git would have
made it quite impossible to distribute what amounts to a rewritten and
inadvertently tampered with history (because the clients would have
noticed and refused to fast forward). Fewer pieces of state managed
independently + less code = more reliable service.

- low availability of hackage: indeed, hosting issues have been a
major culprit here. But the point is, with the package database
maintained as a Git repo separate from hackage-server, the repo can be
served from any (highly available) Git provider, such as Github or
Bitbucket, and continue to be served to clients even if the Hackage
front page is down for whatever reason. Hosting we don't have to
manage ourselves is hosting we don't have to keep humming. Of course
no service guarantees 100% uptime, so mirrors are a key additional (or
alternative) ingredient here. Efficient, low-latency and reliable
mirroring is certainly possible by other means, but mirroring a
history of changes is exactly what Git was designed for, and what it
does well. Why reinvent that?

> However, I don’t think that migrating to git will solve any of the problems you mentioned above in your parenthetical. It _does_ help with the incremental fetch issue (though there are other ways to do that), and it _is_ a way to tackle the index signing issue, though I’m not sure that it is the best way (in particular, given the difficulty of configuring git _with keys_ on windows).

That's an interesting concern, though without knowing more, this is
not an actionable issue. What difficulties? If MinGHC packaged
Git+gpg4win, what would the issue be?

Best,

Mathieu

On 4 May 2015 at 03:12, Gershom B <gershomb at gmail.com> wrote:
> On May 3, 2015 at 5:14:58 AM, Mathieu Boespflug (mboes at tweag.net) wrote:
>
>> The more general point here is whether leveraging (arguably standard)
>> third-party commands and/or C code in order to keep our maintainance
>> burden low, and pick up many robust features for free to boot, is a
>> good approach. I believe that it is. Our infrastructure and tooling is
>> cracking at the seams as it is (cabal-install mysteriously dropping
>> HTTP connections and corrupting .cabal files when behind a corporate
>> firewall, updates to hackage-server inadvertently reversing the order
>> of revisions, low availability of Hackage, ...). Leveraging Git would
>> solve all mentioned problems, plus give us incremental updates for
>> free, plus give us package index signing for little effort.
>
> This seems to me to be mixing apples and oranges and pears and artichokes. The primary reason for hackage downtime in the past was instability of our hetzner box. Migrating to rackspace did wonders for us. Regardless, choice of hardware/webhost is orthogonal to git. You then list a logic bug in a version of hackage server. Logic bugs, as I’m sure you’re aware, can be introduced anywhere that code is written. No proposal put forward involves not writing and running code, so while we can work on better regression test suites, code-review procedures, etc., this has very little to do with adoption of git (especially as I understand the reverse in revision order was a bug on _display_ which this proposal doesn’t address at all). Finally, you discuss cabal-install having trouble behind firewalls. I agree with this being a problem, and I want us to work on this. However, git is again not a magic bullet. I’ve had firewalls where I run into trouble with git too, or with mercurial, or where certain website/firewall combinations meant, mysteriously that curl would work but not wget, or vice versa. I think the plans to expand the choice of transports for cabal-install will improve things in this regard, and could in fact lay the basis for adding git as an additional transport as well.
>
> In summary: You point to real problems that have occurred (of them, only the first [firewalls] is an ongoing issue). There are many other problems you did not point to, but that are also problems, and remain problems. Moving to git as a transport could potentially address some problems, with certain other tradeoffs in terms of other tooling choices we would have to make. However, I don’t think that migrating to git will solve any of the problems you mentioned above in your parenthetical. It _does_ help with the incremental fetch issue (though there are other ways to do that), and it _is_ a way to tackle the index signing issue, though I’m not sure that it is the best way (in particular, given the difficulty of configuring git _with keys_ on windows).
>
> Cheers,
> Gershom


More information about the Haskell-Cafe mailing list