[Haskell-cafe] Improvements to package hosting and security

Thu Apr 16 12:18:38 UTC 2015

On Thu, Apr 16, 2015 at 2:58 PM Duncan Coutts <duncan at well-typed.com> wrote:

> On Thu, 2015-04-16 at 11:18 +0000, Michael Snoyman wrote:
> > On Thu, Apr 16, 2015 at 1:57 PM Duncan Coutts <duncan at well-typed.com>
> wrote:
>
> > > I was not proposing to change the repository format significantly (and
> > > only in a backwards compatible way). The existing format is pretty
> > > simple, using standard old well understood formats and protocols with
> > > wide tool support.
> > >
> > > The incremental update is fairly unobtrusive. Passive http servers
> don't
> > > need to know about it, and clients that don't know about it can just
> > > download the whole index as they do now.
> > >
> > > The security extensions for TUF are also compatible with the existing
> > > format and clients.
> > >
> > The theme you seem to be creating here is "compatible with current
> format."
> > You didn't say it directly, but you've strongly implied that, somehow,
> Git
> > isn't compatible with existing tooling. Let me make clear that that is,
> in
> > fact, false[1]:
>
> Sure, one can use git or rsync or other methods to transfer the set of
> files that makes up a repository or repository index. The point is,
> existing clients expect both this format and this (http) protocol.
>
> There's a number of other minor arguments to be made here about what's
> simpler and more backwards compatible, but here are two more significant
> and positive arguments:
>
>      1. This incremental update approach works well with the TUF
>         security design
>      2. This approach to transferring the repository index and files has
>         a much lower security attack surface
>
> For 1, the basic TUF approach is based on a simple http server serving a
> set of files. Because we are implementing TUF for Hackage we picked this
> update method to go with it. It's really not exotic, the HTTP spec says
> about byte range requests: "Range supports efficient recovery from
> partially failed transfers, and supports efficient partial retrieval of
> large entities." We're doing an efficient partial retrieval of a large
> entity.
>
> For 2, Mathieu elsewhere in this thread pointed to an academic paper
> about attacks on package repositories and update systems. A surprising
> number of these are attacks on the download mechanism itself, before you
> even get to trying to verify individual package signatures. If you read
> the TUF papers you see that they also list these attacks and address
> them in various ways. One of them is that the download mechanism needs
> to know in advance the size (and content hash) of entities it is going
> to download. Also, we should strive to minimise the amount of complex
> unaudited code that has to run before we get to checking the signature
> of the package index (or individual package tarballs). In the TUF
> design, the only code that runs before verification is downloading two
> files over HTTP (one that's known to be very small, and the other we
> already know the length and signed content hash). If we're being
> paranoid we shouldn't even run any decompression before signature
> verification. With our implementation the C code that runs before
> signature verification is either none, or just zlib decompression if we
> want to do on-the-fly http transport compression, but that's optional if
> we don't want to trust zlib's security record (though it's extremely
> widely used). By contrast, if we use rsync or git then there's a massive
> amount of unaudited C code that is running with your user credentials
> prior to signature verification. In addition it is likely vulnerable to
> endless data and slow download attacks (see the papers).
>
>
>
I never claimed nor intended to imply that range requests are non-standard.
In fact, I'm quite familiar with them, given that I implemented that
feature of Warp myself! What I *am* claiming as non-standard is using range
requests to implement an incremental update protocol of a tar file. Is
there any prior art to this working correctly? Do you know that web servers
will do what you need and server the byte offsets from the uncompressed tar
file instead of the compressed tar.gz? Where are you getting the signatures
for, and how does this interact with 00-index.tar.gz files served by
non-Hackage systems?

On the security front: it seems that we have two options here:

1. Use a widely used piece of software (Git), likely already in use by the
vast majority of people reading this mailing list, relied on by countless
companies and individuals, holding source code for the kernel of likely
every mail server between my fingertips and the people reading this email,
to distribute incremental updates. And as an aside: that software has built
in support for securely signing commits and verifying those signatures.

2. Write brand new code deep inside two Haskell codebases with little
scrutiny to implement a download/update protocol that (to my knowledge) has
never been tested anywhere else in the world.

Have I misrepresented the two options at all?

I get that you've been working on this TUF-based system in private for a
while, and are probably heavily invested already in the solutions you came
up with in private. But I'm finding it very difficult to see the reasoning
to reinventing wheels that need to reinventing.

MIchael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20150416/ac1f2050/attachment.html>