[Haskell-cafe] [haskell-infrastructure] Improvements to package hosting and security

Wed Apr 15 05:43:40 UTC 2015

On Wed, Apr 15, 2015 at 8:19 AM Gershom B <gershomb at gmail.com> wrote:

> Ok, to narrow it down, you are concerned about the ability to
>
> > * Properly authenticate users
> > * Keep authorization lists of who can make uploads/revisions (and who
> can grant those rights)
>
> and more specifically:
>
> > * Currently, authorized uploaders are identified by a user name and a
> > password on Hackage. How do we correlate that to a GPG key? Ideally, the
> > central upload authority would be collecting GPG public keys for all
> > uploaders so that signature verification can happen correctly.
> > * There's no way for an outside authority to vet the 00-index.tar.gz file
> > downloaded from Hackage; it's a completely opaque, black box. Having the
> > set of authorization rules be publicly viewable, auditable, and
> verifiable
> > overcomes that.
>
> On 1) now you have the problem “what if the central upload authority’s
> store of GPG keys is violated”. You’ve just kicked the can. “Web of Trust”
> is not a tractable answer. My answer is simpler: I can verify that the
> signer of version 1 of a package is the same as the signer of version 0.1.
> This is no small trick. And I can do so orthogonal to hackage. Now, if I
> really want to verify that the signer of version 1 is the person who is
> “Michael Snoyman” and is in fact the exact Michael Snoyman I intend, then I
> need to get your key by some entirely other mechanism. And that is my
> problem, and, definitionally, no centralized store can help me in that
> regard unless I trust it absolutely — which is precisely what I don’t want
> to do.
>
>
You've ruled out all known solutions to the problem, therefore no solution
exists ;)

To elaborate slightly: the issue of obtaining people's keys is a problem
that exists in general, and has two main resolutions: a central authority,
and a web of trust. You've somehow written off completely the web of trust
(I'm not sure *why* you think that's a good idea, you haven't explained
it), and then stated that- since the only remaining option is a central
authority- it's no better than Hackage. I disagree:

1. Maintaining security of a single GPG key is much simpler than
maintaining the security of an entire web application, as is currently
needed by Hackage.
2. There's no reason we need an either/or setup: we can have a central
authority sign keys. If user's wish to trust that authority, they may do
so, and thereby get access to other keys. If that central authority is
compromised, we revoke that authority and move on to another one.
Importantly: we haven't put all our eggs in one basket, as is done today.

> On 2) I would like to understand more of what your concern with regards to
> “auditing” is. What specific information would you like to know that you do
> not? Improved audit logs seem again orthogonal to any of these other
> security concerns, unless you are simply worried about a “metadata only”
> attack vector. In any case, we can incorporate the same signing practices
> for metadata as for packages — orthogonal to hackage or any other
> particular storage mechanism. It is simply an unrelated question. And,
> honestly, compared to all the other issues we face I feel it is relatively
> minor (the signing component, not a better audit trail).
>
>
There's a lot of stuff going on inside of Hackage which we have no insight
into or control over. The simplest is that we can't review a log of
revisions. Improving that is a good thing, and I hope Hackage does so.
Nonetheless, I'd still prefer a fully open, auditable system, which isn't
possible with "just tack it on to Hackage."

> In any case, your account of the first two points reveals some of the
> confusion I think that remains:
>
> > * Allow safe uploads of packages and metadata
> > * Distribute packages and metadata to users safely
>
> What is the definition of “safe” here? My understanding is that in the
> field of security one doesn’t talk about “safe” in general, but with
> regards to a particular profile of a sort of attacker, and always only as a
> difference of degree, not kind.
>
>
I didn't think this needed diving into, because the problems seem so
fundamental they weren't worth explaining. Examples of safety issues are:

* An attacker sitting between an uploader and Hackage can replace the
package contents with something nefarious, corrupting the package for all
downloaders
* An attacker sitting between a downloader and Hackage can replace the
package contents with something nefarious, corrupting the package for that
downloader
* This doesn't even have to be a conscious attack; I saw someone on Reddit
report that they tried to download a package at an airport WiFi, and
instead ended up downloading the HTML "please log in" page
* Eavesdropping attacks on uploaders: it's possible to capture packets
indicating upload headers to Hackage, such as when using open WiFi (think
the airport example again). Those headers include authorization headers.
Thanks to Hackage now using digest authentication, this doesn't lead to an
immediate attack, but digest authentication is based on MD5, which is not
the most robust hash function
* Normal issues with password based authentication: insecure passwords,
keyloggers, etc.
* Vulnerabilities in the Hackage codebase or its hosting that expose
passwords and/or allow arbitrary uploads

> So who do we want to prevent from doing what? How “safe” is “safe”? Safe
> from what? From a malicious script-kid, from a malicious collective “in it
> for the lulz,” from a targeted attack against a particular end-client, from
> just poorly/incompetently written code? What are we “trusting”? What
> concrete guarantees would we like to make about user interactions with
> packages and package repositories?
>
> While I’m interrogating language, let me pick out one other thing I don’t
> understand: "creating a coherent set of packages” — what do you mean by
> “coherent”? Is this something we can specify? Hackage isn’t supposed to be
> coherent — it is supposed to be everything. Within that “everything” we are
> now attempting to manage metadata to provide accurate dependency
> information, at a local level. But we have no claims about any global
> coherence conditions on the resultant graphs. Certainly we intend to be
> coherent in the sense that the combination of a name/version/revision
> should indicate one and only one thing (and that all revisions of a version
> should differ at most in dependency constraints in their cabal file) — but
> this is a fairly minimal criteria. And in fact, it is one that is nearly
> orthogonal to security concerns altogether.
>
>
All I meant is a set of packages uploaded by an approved set of uploaders,
as opposed to allowing in arbitrary modifications used by others.

> What I’m driving at is — it sounds like we _mainly_ want new decentralized
> security mechanisms, at the cabal level, but we also want, potentially, a
> few centralized mechanisms. However, centralization is weakness from a
> security standpoint. So, ideally, we want as few centralized mechanisms as
> possible, and we want the consequences of those mechanisms being broken to
> be “recoverable” at the point of local verification.
>
>
Yes, that's exactly the kind of goal I'm aiming towards.

> Let me spell out a threat model where that makes sense. An adversary takes
> control of the entire hackage server through some zero day linux exploit we
> have no control over — or perhaps they are an employee at the datacenter
> where we host hackage and secure control via more direct means, etc. They
> have total and complete control over the box. They can accept anything they
> want, and they can serve anything they want. And they are sophisticated
> enough to be undetected for say a week.
>
> Now, we want it to be the case that _whatever_ this adversary does, they
> cannot “trick” someone who types “cabal install warp” into instead cabal
> installing something malicious. How do we do so? _Now_ we have a security
> problem that is concrete enough to discuss. And furthermore, I would claim
> that if we don’t have at least some story for this threat model, then we
> haven’t established anything much “safer” at all.
>
> This points towards a large design space, and a lot of potential ideas,
> all of which feel entirely different than the “strawman” proposal, since
> the emphasis there is towards the changes to a centralized mechanism (even
> if in turn, the product of that mechanism itself is then distributed and
> git cloneable or whatever).
>
>
If we have agreement that the problem exists, I'm quite happy to flesh out
other kinds of attack vectors and then discuss solutions. Again, my
proposal is purely meant to be a starting point for discussion, not an
answer to the problems.

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20150415/e0b03cfe/attachment.html>