[GHC DevOps Group] CI

Thu Oct 12 13:18:24 UTC 2017

Manuel M T Chakravarty <manuel.chakravarty at tweag.io> writes:

> As promised, I have taken a first cut at listing the requirements and
> the pros and cons of the main contenders on a Trac page:
>
>   https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration
>
I think this list is being a bit generous to the hosted option.

Other costs of this approach might include:

 * Under this heterogeneous scheme we will have to maintain two or more
   distinct CI systems, each requiring some degree of setup and
   maintenance.

 * Using qemu for building on/for a non-Linux/amd64 platforms requires a
   non-negligible amount of additional complexity (see rust's CI
   implementation [1])

 * It's unclear whether testing GHC via qemu is even practical given
   computational constraints.

 * We lose the ability to prioritize jobs, requiring more hardware to
   maintain similar build turnaround

 * We are utterly dependent on our CI service(s) to behave well; for
   instance, here are two examples that the Rust infrastructure team
   related to me,

     * They have been struggling to keep Travis the tail of their build
       turnaround time distribution in check, with some builds taking
       over 8 hours to complete. Despite raising the issue with Travis
       customer support they are still having trouble, despite being a
       paying customer.

     * They have noticed that Travis has a tendency to simply drop builds
       in mid-flight, losing hours of work. Again, despite working with
       upstream they haven't been able to resolve the problem

     * They have been strongly affected by apparent instability in
       Travis' OS X infrastructure which goes down, to quote, "*a lot*"

   Of course, both of these are picking on Travis in particular as that
   is the example we have available. However, in general the message
   here is that by giving up our own infrastructure we are at the mercy
   of the services that we use. Unfortunately, sometimes those services
   are not accustomed to testing projects of the scale of GHC or rustc.
   At this point you have little recourse but to minimize the damage.

We avoid all of this by self-hosting (at, of course, the expense of
administration time). Furthermore, we continue to benefit from hardware
provided by a multitude of sources including users, Rackspace (and other
VPS providers if we wanted), and programs like OSU OSL. It is important
to remember that until recently we were operating under the assumption
that these were the only resources available to us for testing.

It's still quite unclear to me what a CircleCI/Appveyor solution will
ultimately cost, but will almost certainly not be free. Assuming there
are users who are willing to foot that bill, this is of course fine.
However, it's quite contrary to the assumptions we have been working
with for much of this process.

Lastly: If I understand the point correctly, the "the set up is not
forkable" "con" of Jenkins is not accurate. Under Jenkins the build
configuration resides in the repository being tested. A user can easily
modify it and submit a PR, which will be tested just like any other
change.

[1] https://github.com/rust-lang/rust/tree/master/src/ci

> Maybe I am biased, but is there any advantage to Jenkins other than
> that we can run builds and tests on exotic platforms?

Some of these "exotic" platforms might also be called "the most populous
architecture in the world" (ARM), "the operating system that feeds a
third of the world's Internet traffic (FreeBSD), and "the operating
system that powers much of the world's financial system" (AIX). I'm not
sure that the "exotic" label really does these platforms justice.

More importantly, all of these platforms have contributors working on
their support in GHC. Historically, GHC HQ has tried to recognize their
efforts by allowing porters to submit binary distributions which are
distributed alongside GHC HQ distributions. Recently I have tried to
pursue a different model, handling some of these binary builds myself in
the name of consistency and reduced release overhead (as previously we
incurred a full round-trip through binary build contributors every time
we released).

The desire to scale our release process up to handle the breadth of
platforms that GHC supports, with either Tier 1 or what is currently
Tier 2 support, was one motivation for the new CI effort. While I don't
consider testing any one of these platforms to be a primary goal, I do
think it is important to have a viable plan by which they might be
covered in the future for this reason.

To be clear, I am supportive of the CI-as-a-service direction. However,
I want to recognize the trade-offs where they exist and have answers to
some of the thorny questions, including those surrounding platform
support, before committing.

Cheers,

- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devops-group/attachments/20171012/ef360ece/attachment.sig>