On CI

Ben Gamari ben at smart-cactus.org
Thu Feb 18 22:34:35 UTC 2021


Apologies for the latency here. This thread has required a fair amount of
reflection.

Sebastian Graf <sgraf1337 at gmail.com> writes:

> Hi Moritz,
>
> I, too, had my gripes with CI turnaround times in the past. Here's a
> somewhat radical proposal:
>
>    - Run "full-build" stage builds only on Marge MRs. Then we can assign to
>    Marge much earlier, but probably have to do a bit more of (manual)
>    bisecting of spoiled Marge batches.
>       - I hope this gets rid of a bit of the friction of small MRs. I
>       recently caught myself wanting to do a bunch of small, independent, but
>       related changes as part of the same MR, simply because it's such a hassle
>       to post them in individual MRs right now and also because it
>       steals so much CI capacity.
>
>    - Regular MRs should still have the ability to easily run individual
>    builds of what is now the "full-build" stage, similar to how we can run
>    optional "hackage" builds today. This is probably useful to pin down the
>    reason for a spoiled Marge batch.


I am torn here. For most of my non-trivial patches I personally don't
mind long turnarounds: I walk away and return a day later to see whether
anything failed. Spurious failures due to fragile tests make this a bit
tiresome, but this is a problem that we are gradually solving (by fixing
bugs and marking tests as fragile).

However, I agree that small MRs are currently rather painful. On the
other hand, diagnosing failed Marge batches is *also* rather tiresome. I
am worried that by deferring full validation of MRs we will only
exacerbate this problem. Furthermore, I worry that by deferring full
validation we run the risk of rather *increasing* the MR turnaround
time, since there are entire classes of issues that wouldn't be caught
until the MR made it to Marge.

Ultimately it's unclear to me whether this proposal would help or hurt.
Nevertheless, I am willing to try it. However, if we go this route we
should consider what can be done to reduce the incidence of failed Marge
batches.

One problem that I'm particularly worried about is that of tests with
OS-dependent expected output (e.g. `$test_name.stdout-mingw32). I find
that people (understandably) forget to update these when updating test
output. I suspect that this will be a frequent source of failed Marge
batches if we defer full validation. I can see a few ways that would
mitigate this:

 * eliminate platform-dependent output files
 * introduce a linter that fails if it sees a test with
   platform-dependent output that doesn't touch all output files
 * always run the full-build stage on MRs that touch tests with
   platform-dependent output files

Regardless of whether we implement Sebastian's proposal, one smaller
measure we could implement to help the problem of small MRs is to
introduce some sort of mechanism to mark MRs as "trivial" (e.g. a label
or a commit/MR description keyword), which results in the `full-build`
being skipped for that MR. Perhaps this would be helpful?


> Another frustrating aspect is that if you want to merge an n-sized chain of
> dependent changes individually, you have to
>
>    - Open an MR for each change (initially the last change will be
>    comprised of n commits)
>    - Review first change, turn pipeline green   (A)
>    - Assign to Marge, wait for batch to be merged   (B)
>    - Review second change, turn pipeline green
>    - Assign to Marge, wait for batch to be merged
>    - ... and so on ...
>
> Note that this (A) incurs many context switches for the dev and the latency of
> *at least* one run of CI.
> And then (B) incurs the latency of *at least* one full-build, if you're
> lucky and the batch succeeds. I've recently seen batches that were
> resubmitted by Marge at least 5 times due to spurious CI failures and
> timeouts. I think this is a huge factor for latency.
>
> Although after (A), I should just pop the the patch off my mental stack,
> that isn't particularly true, because Marge keeps on reminding me when a
> stack fails or succeeds, both of which require at least some attention from
> me: Failed 2 times => Make sure it was spurious, Succeeds => Rebase next
> change.
>
> Maybe we can also learn from other projects like Rust, GCC or clang, which
> I haven't had a look at yet.
>
I did a bit of digging on this.

 * Rust: It appears that Rust's CI scheme is somewhat similar to what
   you proposed above. They do relatively minimal validation of MRs
   (e.g. https://github.com/rust-lang/rust/runs/1905017693),
   with a full-validation for merges
   (e.g. https://github.com/rust-lang-ci/rust/runs/1925049948). The latter
   usually takes between 3 and 4 hours, with some jobs taking 5 hours.

 * GCC: As far as I can tell, gcc doesn't actually have any (functional)
   continuous integration. Discussions with contributors suggest that
   some companies that employ contributors might have their own private
   infrastructure, but I don't believe there is anything public.

 * LLVM: I can't work out whether/how LLVM validates MRs (their Phabricator instance
   mentions Buildkite, although it appears to be broken). `master`
   appears to be minimally checked (only Linux/x86-64) via buildbot
   (http://lab.llvm.org:8011/#/builders/16/builds/6593). These jobs take
   between 3 and 4 hours, although it's unclear what one shou

 * Go: Go's appears to have its own homebrew CI infrastructure
   (https://build.golang.org/) for comprehensive testing of master it's
   hard to tell how long these take but it's at least two hours. Code
   review happens by way of Gerrit with integration with some sort of
   CI. These runs take between 1 and 3 hours and seem to test a fairly
   comprehensive set of configurations.
   
Cheers,

- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20210218/0e16305d/attachment.sig>


More information about the ghc-devs mailing list