[Haskell-cafe] Improvements to package hosting and security

Thu Apr 16 11:58:41 UTC 2015

On Thu, 2015-04-16 at 11:18 +0000, Michael Snoyman wrote:
> On Thu, Apr 16, 2015 at 1:57 PM Duncan Coutts <duncan at well-typed.com> wrote:

> > I was not proposing to change the repository format significantly (and
> > only in a backwards compatible way). The existing format is pretty
> > simple, using standard old well understood formats and protocols with
> > wide tool support.
> >
> > The incremental update is fairly unobtrusive. Passive http servers don't
> > need to know about it, and clients that don't know about it can just
> > download the whole index as they do now.
> >
> > The security extensions for TUF are also compatible with the existing
> > format and clients.
> >
> The theme you seem to be creating here is "compatible with current format."
> You didn't say it directly, but you've strongly implied that, somehow, Git
> isn't compatible with existing tooling. Let me make clear that that is, in
> fact, false[1]:

Sure, one can use git or rsync or other methods to transfer the set of
files that makes up a repository or repository index. The point is,
existing clients expect both this format and this (http) protocol.

There's a number of other minor arguments to be made here about what's
simpler and more backwards compatible, but here are two more significant
and positive arguments:

     1. This incremental update approach works well with the TUF
        security design
     2. This approach to transferring the repository index and files has
        a much lower security attack surface

For 1, the basic TUF approach is based on a simple http server serving a
set of files. Because we are implementing TUF for Hackage we picked this
update method to go with it. It's really not exotic, the HTTP spec says
about byte range requests: "Range supports efficient recovery from
partially failed transfers, and supports efficient partial retrieval of
large entities." We're doing an efficient partial retrieval of a large
entity.

For 2, Mathieu elsewhere in this thread pointed to an academic paper
about attacks on package repositories and update systems. A surprising
number of these are attacks on the download mechanism itself, before you
even get to trying to verify individual package signatures. If you read
the TUF papers you see that they also list these attacks and address
them in various ways. One of them is that the download mechanism needs
to know in advance the size (and content hash) of entities it is going
to download. Also, we should strive to minimise the amount of complex
unaudited code that has to run before we get to checking the signature
of the package index (or individual package tarballs). In the TUF
design, the only code that runs before verification is downloading two
files over HTTP (one that's known to be very small, and the other we
already know the length and signed content hash). If we're being
paranoid we shouldn't even run any decompression before signature
verification. With our implementation the C code that runs before
signature verification is either none, or just zlib decompression if we
want to do on-the-fly http transport compression, but that's optional if
we don't want to trust zlib's security record (though it's extremely
widely used). By contrast, if we use rsync or git then there's a massive
amount of unaudited C code that is running with your user credentials
prior to signature verification. In addition it is likely vulnerable to
endless data and slow download attacks (see the papers).

-- 
Duncan Coutts, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/