[GHC DevOps Group] Fwd: DevOps: Next steps

Manuel M T Chakravarty manuel.chakravarty at tweag.io
Tue Oct 10 06:57:40 UTC 2017


[RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html]


Ben, thanks for pointing out important issues in our requirements.

And, Mathieu, thanks for moving this to the list.

> 05.10.2017, 08:31, Boespflug, Mathieu <m at tweag.io>:
> 
> Ben's response. Copying it to the list now that this list exists.
> 
> ---------- Forwarded message ----------
> From: Ben Gamari <ben at well-typed.com>
> Date: 4 October 2017 at 19:30
> Subject: Re: DevOps: Next steps
> To: Manuel M T Chakravarty <manuel.chakravarty at tweag.io>
> Cc: Mathieu Boespflug <m at tweag.io>, Jonas Pfenniger Chevalier
> <jonas.chevalier at tweag.io>
> 
> Manuel M T Chakravarty <manuel.chakravarty at tweag.io> writes:
> 
>> When we talked on the phone, you mentioned that we need to be able to
>> support all the Tier 1 platforms, and we both concluded that this
>> implies the need for using Jenkins and we can’t, e.g., use CircleCI as
>> they only support macOS and Linux. Mathieu and Jonas explained to me
>> that this is actually not the case. Apparently, Rust solves this issue
>> by building Linux and macOS artefacts on CircleCI, Windows on
>> Appveyor, and everything else using QEMU on CircleCI (e.g., FreeBSD
>> could be done that way and eventually ARM builds).
>> 
> Indeed when starting this I looked a bit at what rustc does. By my
> recollection, they don't actually perform builds on anything but
> Linux/amd64. Instead they build cross-compilers on x86-64, use these to
> build their testsuite artifacts, and then run these under qemu (and in
> some cases, e.g. FreeBSD, they don't even do this).
> 
> While in general I would love to be able to do everything with
> cross-compiled binaries from Linux/amd64, our cross-compilation story
> may be a bit lacking to pull this off at the moment. Moritz Angerman has
> been making great strides in this area recently but it's going to be a
> while until we can really make this work. In particular, our Template
> Haskell story will need quite some work before we can reliably do a full
> cross-compiled testsuite run.
> 
> In general I'm a bit skeptical of moving to a solution that relegates
> non-Linux/amd64 builds to a VM. Non-Linux/amd64 platforms have
> commercial users and do deserve first-class CI support. Furthermore,
> without KVM or hypervisor support (which, as far as I can tell, CircleCI
> does not provide [1]) I'm not sure that virtualisation will allow us to
> get where we want to be in terms of test coverage and build response
> time due to the cost of virtualisation. Without hardware support qemu
> can be rather expensive.

Sorry for not expressing myself clearly here. I didn’t want to propose to exactly copy Rust’s approach. In particular, as you are writing, relying on cross-compilation is not an option for us. (Although, from my reading of the Rust repo, they do not build everything via QEMU.)

In concrete terms, the proposal for GHC would be the following:

* Linux & macOS builds: CircleCI
* Windows builds: Appveyor
* Everything else: QEMU (and maybe it is not necessary to run all the test on these either)

>> They convinced me that this is a worthwhile direction to consider for
>> the following reasons:
>> 
>> * Jenkins is a fickle beast: apparently scaling Jenkins to work
>> reliably when running tests against multiple PRs on distributed
>> infrastructure is hard — we ran into significant problems in a client
>> project recently.
>> 
> 
> I agree that Jenkins is a rather fickle beast; indeed it can be
> positively infuriating to work with. However, I've not yet noticed the
> scaling issues you describe. What in particular did you observe?

Jonas, could you maybe explain it?

>> * All the custom set up and maintaining of build nodes etc required by
>> Jenkins disappears. (Mathieu built the CircleCI setup that he
>> contributed recently quite quickly, so there really is little overhead
>> in setting this up.)
>> 
> I'm not sure that the difference here is actually so great. Yes, in the
> case of Jenkins you do have physical machines to administer. However,
> this typically isn't the hard part. If you look at Rust's configuration,
> they have roughly a dozen Docker environments which they had to setup
> and maintain; this effort will likely far outweigh the setup cost of the
> machines themselves. This has certainly been the case for Jenkins and I
> suspect it would be true of CircleCI as well; this is simply the cost to
> entry for cross-platform testing.

I misspoke earlier and Rust seems to use Travis CI together with Appveyor. Looking at

  https://github.com/rust-lang/rust/blob/master/.travis.yml <https://github.com/rust-lang/rust/blob/master/.travis.yml>

and 

  https://github.com/rust-lang/rust/blob/master/appveyor.yml <https://github.com/rust-lang/rust/blob/master/appveyor.yml>

They only seem to do the Docker thing for their cross-compilation targets (and I believe those are always going to be harder to set up).

One nice thing about this, as Mathieu pointed out, is that somebody who forks the repo can just run the same CI on their own Travis/Circle/Appveyor accounts with little effort — just as we are doing this currently with Tweag’s linear types fork of GHC:

  https://github.com/tweag/ghc/blob/linear-types/.circleci/config.yml <https://github.com/tweag/ghc/blob/linear-types/.circleci/config.yml>

This is a powerful way of scaling.

> Moreover, we can't write off the cost of integrating with CircleCI. Of
> course, if we do decide to move to GitHub then perhaps this cost shrinks
> dramatically. However, until this decision is made it seems like we need
> to assume that Phabricator integration will be necessary.

By the ”re-use existing infrastructure instead of writing your own” mantra, this is just another reason to go for GitHub.

>> * The problems we discussed with possibly not having enough Rackspace
>> capacity for the transition disappears.
>> 
> In some sense this is true; however, it seems like we are trading one
> commodity of finite supply for another. We currently have Rackspace
> credit and consequently these instances can be considered to be
> essentially free.
> 
> While CircleCI is does offer four free containers for open source
> projects (and perhaps a bit more in our case if we ask), I'm skeptical
> that this will be enough; currently our four build bots give us
> multi-day wait times which makes development remarkably painful. The
> appeal of Jenkins is that we can shorten this timescale as well as grow
> our test coverage with the resources that we already have.
> 
> Let's have a brief look at what resources we may need.
> 
> A quick back-of-the-envelope calculation suggests that to simply keep up
> with our current average commit rate (around 200 commits/month) for the
> four environments that we currently build we need a bare minimum of:
> 
>    200 commit/month
>  * 4 build/commit             (Linux/i386, Linux/amd64,
>                                OS X, Windows/amd64)
>  * 2.5 CPU-hour/build         (approx. average across platforms
>                                for a validate)
>  / (2 CPU-hour/machine-hour)  (CircleCI appears to use 2 vCPU instances)
>  / (30*24 machine-hour/month)
>  ~ 2 machines
> 
> note that this doesn't guarantee reasonable wait times but rather merely
> ensure that we can keep up on the mean. On top of this, we see around
> 300 differential revisions per month. This requires another 3 machines
> to keep up.
> 
> So, we need at least five machines but, again, this is a minimum;
> modelling response times is hard but I expect we would likely need to
> add at least two more machines to keep response times in the
> contributor-friendly range, especially considering that under Circle CI
> we will lose the ability to prioritize jobs (by contrast, with Jenkins
> we can prioritize pull requests as this is the response time that we
> really care about). Now consider that we would like to add at least
> three more platforms (FreeBSD, OpenBSD, Linux/aarch64, all of which may
> be relatively slow to build due to virtualisation overhead) as well as a
> few more build configurations on amd64 (LLVM, unregisterised, at least
> one cross-compilation target) and a periodic slow validation and we may
> be at over a dozen machines.
> 
> All of this appears to put us well outside CircleCI's offering to
> open-source projects. Of course, it may be worth asking whether they are
> willing to extend GHC a more generous offer. However, I don't think we
> can count on this and I'm not certain that Haskell.org is currently in a
> position to be able to shoulder such a financial burden.

Mathieu has indicated that Tweag would be willing to contribute towards those costs. (Developer time, such as yours, is so much more expensive than these subscription costs that it’ll always be more efficient to outsource to CI companies.)

>> Also, Jonas could help us getting things running and, I think, his
>> wealth of experience would be very useful. (At least, I would be very
>> grateful for his advise.)
>> 
>> I think, this route has the potential to get us to where we want to be
>> quite quickly and in a manner that is very little effort to maintain
>> once set up. What do you think?
>> 
> Indeed I can see that there are many advantages to the CircleCI option.
> The ease of bringing up a Linux/amd64 build environment which easily
> scales and requires no administration is quite enticing. However, I am a
> skeptical that it will be as easy to get the full suite of builds that
> we are aiming to produce. I would be quite curious to see what Jonas has
> to say on the matter of non-Linux platforms. Seeing a simple
> configuration which compiles and tests even a FreeBSD/amd64 build in a
> reasonable amout of time may well be enough to convince me.

Ok, fair enough, let’s look at exactly how hard this is.

Cheers,
Manuel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://haskell.org/pipermail/ghc-devops-group/attachments/20171010/c84bf0b3/attachment.html>


More information about the Ghc-devops-group mailing list