[Haskell-cafe] Improvements to package hosting and security

Mon May 4 15:49:56 UTC 2015

One more point I realized -- switching to git as a transport _for the
package index_ isn't a general purpose solution to the transport problem.
Users also need a transport to download cabalized packages, and also to
upload them. (And, whenever we get distributed build-reports finished, to
upload those too, I suppose.) To my knowledge, the idea on the table
doesn't solve that?

-g

On Mon, May 4, 2015 at 9:55 AM, Gershom B <gershomb at gmail.com> wrote:

> On May 4, 2015 at 4:42:05 AM, Mathieu Boespflug (mboes at tweag.net) wrote:
>
> > - cabal-install mysteriously dropping HTTP connections and corrupting
> > .cabal files: this particular firewall that I've seen is used by
> > hundreds of developers in the company without it silently truncating
> > requests on anything else but Cabal updates. Investigations so far
> > point to a bad interaction between Network.HTTP and lazy bytestrings,
> > see http://www.hawaga.org.uk/tmp/bug-cabal-zlib-http-lazy.html (no bug
> > report just yet). Reusing the same download mechanism that hundreds of
> > others are already using in the company means we are not at risk of a
> > firewall triggering an obscure latent race condition in the way
> > cabal-install retrieves HTTP responses. It means if there is a real
> > problem with the firewall, it won't just be for the local Haskellian
> > outpost who are trying to sell Haskell to their boss, but for
> > everyone, and therefore fixed.
>
> Yes, in this particular case, clearly using git is a transport that works
> and using HTTP is a transport that doesn’t. But as you note, this appears
> to be a problem with the firewall, not the HTTP library. You’re right that
> moving to a transport used more widely would help this problem. But, so
> would moving to curl apparently. In any case, as I wrote, the best way to
> address this is to make ourselves more generally flexible in our transport
> layer — and the way to do this is not to swap the HTTP library simply for
> git, but to open up our choices more broadly. Which is precisely the plan
> already under discussion with regards to Cabal. Git is no magic bullet
> here. It is just “anything besides the current thing that happens to
> trigger a specific bug in a specific firewall."
>
> > - the reversing revisions issue was NOT just a display issue: it
> > completely broke Stackage Nightly builds that day, which just calls
> > `cabal update` under the hood:
> > https://github.com/haskell/hackage-server/issues/305. Other users of
> > Hackage in that time window also experienced the issue. It's an issue
> > that caused massive breakage in a lot of places. Notice how PkgInfo_v2
> > is a data structure that is entirely redundant with what Git would
> > provide already, so need not be serialized to disk, have migrations
> > written for it, etc, nor perhaps exist at all. Further, Git would have
> > made it quite impossible to distribute what amounts to a rewritten and
> > inadvertently tampered with history (because the clients would have
> > noticed and refused to fast forward). Fewer pieces of state managed
> > independently + less code = more reliable service.
>
> Ah I see — they were flipped in the migration, not just in the display of
> the data. Regardless — there will always be a layer between our data
> storage — be it git, acid-state, database, anything else — and the
> programmatic use we make of that data. No matter what we do to that storage
> layer, the intermediate layer will need to turn that into a programmatic
> representation, and then the frontend services will need to display/make
> use of it. No matter what, there is always room for such bugs. You might
> say “but the server couldn’t cause such a bug in this system!” That’s silly
> — the deserialization from that storage layer will just take place later
> then — at each client. And they could cause such a bug. So yes, the literal
> place the bug was found is in code that would be different under a
> different storage layer. But there’s absolutely nothing in switching
> storage layers that rules out such bugs.
>
> And furthermore, in the migration you propose, which involves taking all
> our data, pushing it into an entirely new representation, and then
> rewriting the entire hackage-server to talk to this new representation at
> all stages, and writing cabal-install to do the same — I promise that this
> would necessarily create a _whole lot_ of bugs.
>
> Again, there may be reasons to do this (I’m dubious) — but let’s not
> overstate them to sell the case.
>
> > Hosting we don't have to
> > manage ourselves is hosting we don't have to keep humming. Of course
> > no service guarantees 100% uptime, so mirrors are a key additional (or
> > alternative) ingredient here. Efficient, low-latency and reliable
> > mirroring is certainly possible by other means, but mirroring a
> > history of changes is exactly what Git was designed for, and what it
> > does well. Why reinvent that?
>
> In the last case here, you say that mirroring is easier with git? But
> don’t we already have mirroring now? And haven’t we had it for some time?
> The work underway, to my knowledge, is only to make mirroring more secure
> (as a related consequence of making hackage in general more secure). So
> this seems a silly thing to raise.
>
> > > However, I don’t think that migrating to git will solve any of the
> problems you mentioned
> > above in your parenthetical. It _does_ help with the incremental fetch
> issue (though
> > there are other ways to do that), and it _is_ a way to tackle the index
> signing issue, though
> > I’m not sure that it is the best way (in particular, given the
> difficulty of configuring
> > git _with keys_ on windows).
> >
> > That's an interesting concern, though without knowing more, this is
> > not an actionable issue. What difficulties? If MinGHC packaged
> > Git+gpg4win, what would the issue be?
>
> I can give you an example I ran into with MinGHC already — I had a
> preexisting cygwin install on my machine, and tried to install MinGHC. This
> mixed msys paths with cygwin paths and everything mismatched and was
> horrible until I ripped out those msys paths. But now, of course, my new
> GHC can’t find the libraries to build against for e.g. doing a network
> reinstall, which was the entire point of the exercise.
>
> By analogy, many windows users may have an existing git, and some may have
> an existing gpg. These may come from windows binaries (in a few flavors —
> direct, wrapped via tortoise, etc), from cygwin, or perhaps from another
> existing msys install.
>
> Now they’re going to get multiple copies of these programs on their system
> with potentially conflicting paths, settings, etc? (Same goes for gpg, but
> not git on mac). And since we won’t have guarantees that everyone will have
> git, we’ll need to maintain existing transports anyway, so this only gives
> us a very partial solution...
>
> I know there are some neat ideas in what you’re pushing for. But I feel
> like you’re overlooking all the potential issues — and also just
> underestimating the amount of work it would take to cut everything over to
> a new storage layer, on both front and backend, while keeping the set of
> existing features intact.
>
> —Gershom
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20150504/42336963/attachment.html>