[GHC DevOps Group] Fwd: DevOps: Next steps

Greg Steuck (Sh-toy-k) gnezdo at google.com
Tue Oct 10 17:01:42 UTC 2017


Google has a Cloud product with VM offerings. We could supply them as part
of X contribution. Linux
<https://cloud.google.com/compute/docs/quickstart-linux> and Windows
<https://cloud.google.com/compute/docs/instances/windows/> are expressly
supported. FreeBSD <https://cloud.google.com/compute/docs/images> is listed
as an option.

I still would prefer paying CI companies rather than dealing with VMs.

Thanks
Greg

On Mon, Oct 9, 2017 at 11:58 PM Manuel M T Chakravarty <
manuel.chakravarty at tweag.io> wrote:

> [RESENT MESSAGE — see
> https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html
> ]
>
>
> Ben, thanks for pointing out important issues in our requirements.
>
> And, Mathieu, thanks for moving this to the list.
>
> 05.10.2017, 08:31, Boespflug, Mathieu <m at tweag.io>:
>
> Ben's response. Copying it to the list now that this list exists.
>
>
>
> ---------- Forwarded message ----------
> From: Ben Gamari <ben at well-typed.com>
> Date: 4 October 2017 at 19:30
> Subject: Re: DevOps: Next steps
> To: Manuel M T Chakravarty <manuel.chakravarty at tweag.io>
> Cc: Mathieu Boespflug <m at tweag.io>, Jonas Pfenniger Chevalier
> <jonas.chevalier at tweag.io>
>
> Manuel M T Chakravarty <manuel.chakravarty at tweag.io> writes:
>
> When we talked on the phone, you mentioned that we need to be able to
>
> support all the Tier 1 platforms, and we both concluded that this
> implies the need for using Jenkins and we can’t, e.g., use CircleCI as
> they only support macOS and Linux. Mathieu and Jonas explained to me
> that this is actually not the case. Apparently, Rust solves this issue
> by building Linux and macOS artefacts on CircleCI, Windows on
> Appveyor, and everything else using QEMU on CircleCI (e.g., FreeBSD
> could be done that way and eventually ARM builds).
>
> Indeed when starting this I looked a bit at what rustc does. By my
> recollection, they don't actually perform builds on anything but
> Linux/amd64. Instead they build cross-compilers on x86-64, use these to
> build their testsuite artifacts, and then run these under qemu (and in
> some cases, e.g. FreeBSD, they don't even do this).
>
> While in general I would love to be able to do everything with
> cross-compiled binaries from Linux/amd64, our cross-compilation story
> may be a bit lacking to pull this off at the moment. Moritz Angerman has
> been making great strides in this area recently but it's going to be a
> while until we can really make this work. In particular, our Template
> Haskell story will need quite some work before we can reliably do a full
> cross-compiled testsuite run.
>
> In general I'm a bit skeptical of moving to a solution that relegates
> non-Linux/amd64 builds to a VM. Non-Linux/amd64 platforms have
> commercial users and do deserve first-class CI support. Furthermore,
> without KVM or hypervisor support (which, as far as I can tell, CircleCI
> does not provide [1]) I'm not sure that virtualisation will allow us to
> get where we want to be in terms of test coverage and build response
> time due to the cost of virtualisation. Without hardware support qemu
> can be rather expensive.
>
>
> Sorry for not expressing myself clearly here. I didn’t want to propose to
> exactly copy Rust’s approach. In particular, as you are writing, relying on
> cross-compilation is not an option for us. (Although, from my reading of
> the Rust repo, they do not build everything via QEMU.)
>
> In concrete terms, the proposal for GHC would be the following:
>
> * Linux & macOS builds: CircleCI
> * Windows builds: Appveyor
> * Everything else: QEMU (and maybe it is not necessary to run all the test
> on these either)
>
> They convinced me that this is a worthwhile direction to consider for
> the following reasons:
>
> * Jenkins is a fickle beast: apparently scaling Jenkins to work
> reliably when running tests against multiple PRs on distributed
> infrastructure is hard — we ran into significant problems in a client
> project recently.
>
>
> I agree that Jenkins is a rather fickle beast; indeed it can be
> positively infuriating to work with. However, I've not yet noticed the
> scaling issues you describe. What in particular did you observe?
>
>
> Jonas, could you maybe explain it?
>
> * All the custom set up and maintaining of build nodes etc required by
> Jenkins disappears. (Mathieu built the CircleCI setup that he
> contributed recently quite quickly, so there really is little overhead
> in setting this up.)
>
> I'm not sure that the difference here is actually so great. Yes, in the
> case of Jenkins you do have physical machines to administer. However,
> this typically isn't the hard part. If you look at Rust's configuration,
> they have roughly a dozen Docker environments which they had to setup
> and maintain; this effort will likely far outweigh the setup cost of the
> machines themselves. This has certainly been the case for Jenkins and I
> suspect it would be true of CircleCI as well; this is simply the cost to
> entry for cross-platform testing.
>
>
> I misspoke earlier and Rust seems to use Travis CI together with Appveyor.
> Looking at
>
>   https://github.com/rust-lang/rust/blob/master/.travis.yml
>
> and
>
>   https://github.com/rust-lang/rust/blob/master/appveyor.yml
>
> They only seem to do the Docker thing for their cross-compilation targets
> (and I believe those are always going to be harder to set up).
>
> One nice thing about this, as Mathieu pointed out, is that somebody who
> forks the repo can just run the same CI on their own Travis/Circle/Appveyor
> accounts with little effort — just as we are doing this currently with
> Tweag’s linear types fork of GHC:
>
>   https://github.com/tweag/ghc/blob/linear-types/.circleci/config.yml
>
> This is a powerful way of scaling.
>
> Moreover, we can't write off the cost of integrating with CircleCI. Of
> course, if we do decide to move to GitHub then perhaps this cost shrinks
> dramatically. However, until this decision is made it seems like we need
> to assume that Phabricator integration will be necessary.
>
>
> By the ”re-use existing infrastructure instead of writing your own”
> mantra, this is just another reason to go for GitHub.
>
>
> * The problems we discussed with possibly not having enough Rackspace
> capacity for the transition disappears.
>
> In some sense this is true; however, it seems like we are trading one
> commodity of finite supply for another. We currently have Rackspace
> credit and consequently these instances can be considered to be
> essentially free.
>
> While CircleCI is does offer four free containers for open source
> projects (and perhaps a bit more in our case if we ask), I'm skeptical
> that this will be enough; currently our four build bots give us
> multi-day wait times which makes development remarkably painful. The
> appeal of Jenkins is that we can shorten this timescale as well as grow
> our test coverage with the resources that we already have.
>
> Let's have a brief look at what resources we may need.
>
> A quick back-of-the-envelope calculation suggests that to simply keep up
> with our current average commit rate (around 200 commits/month) for the
> four environments that we currently build we need a bare minimum of:
>
>    200 commit/month
>  * 4 build/commit             (Linux/i386, Linux/amd64,
>                                OS X, Windows/amd64)
>  * 2.5 CPU-hour/build         (approx. average across platforms
>                                for a validate)
>  / (2 CPU-hour/machine-hour)  (CircleCI appears to use 2 vCPU instances)
>  / (30*24 machine-hour/month)
>  ~ 2 machines
>
> note that this doesn't guarantee reasonable wait times but rather merely
> ensure that we can keep up on the mean. On top of this, we see around
> 300 differential revisions per month. This requires another 3 machines
> to keep up.
>
> So, we need at least five machines but, again, this is a minimum;
> modelling response times is hard but I expect we would likely need to
> add at least two more machines to keep response times in the
> contributor-friendly range, especially considering that under Circle CI
> we will lose the ability to prioritize jobs (by contrast, with Jenkins
> we can prioritize pull requests as this is the response time that we
> really care about). Now consider that we would like to add at least
> three more platforms (FreeBSD, OpenBSD, Linux/aarch64, all of which may
> be relatively slow to build due to virtualisation overhead) as well as a
> few more build configurations on amd64 (LLVM, unregisterised, at least
> one cross-compilation target) and a periodic slow validation and we may
> be at over a dozen machines.
>
> All of this appears to put us well outside CircleCI's offering to
> open-source projects. Of course, it may be worth asking whether they are
> willing to extend GHC a more generous offer. However, I don't think we
> can count on this and I'm not certain that Haskell.org is currently in a
> position to be able to shoulder such a financial burden.
>
>
> Mathieu has indicated that Tweag would be willing to contribute towards
> those costs. (Developer time, such as yours, is so much more expensive than
> these subscription costs that it’ll always be more efficient to outsource
> to CI companies.)
>
> Also, Jonas could help us getting things running and, I think, his
>
> wealth of experience would be very useful. (At least, I would be very
> grateful for his advise.)
>
> I think, this route has the potential to get us to where we want to be
> quite quickly and in a manner that is very little effort to maintain
> once set up. What do you think?
>
> Indeed I can see that there are many advantages to the CircleCI option.
> The ease of bringing up a Linux/amd64 build environment which easily
> scales and requires no administration is quite enticing. However, I am a
> skeptical that it will be as easy to get the full suite of builds that
> we are aiming to produce. I would be quite curious to see what Jonas has
> to say on the matter of non-Linux platforms. Seeing a simple
> configuration which compiles and tests even a FreeBSD/amd64 build in a
> reasonable amout of time may well be enough to convince me.
>
>
> Ok, fair enough, let’s look at exactly how hard this is.
>
> Cheers,
> Manuel
>
> _______________________________________________
> Ghc-devops-group mailing list
> Ghc-devops-group at haskell.org
> https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://haskell.org/pipermail/ghc-devops-group/attachments/20171010/a2f80467/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4843 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://haskell.org/pipermail/ghc-devops-group/attachments/20171010/a2f80467/attachment-0001.bin>


More information about the Ghc-devops-group mailing list