On CI

Moritz Angermann moritz.angermann at gmail.com
Thu Feb 18 09:29:41 UTC 2021


I'm glad to report that my math was off. But it was off only because I
assumed that we'd successfully build all
windows configurations, which we of course don't. Thus some builds fail
faster.

Sylvain also provided a windows machine temporarily, until it expired.
This led to a slew of new windows wibbles.
The CI script Ben wrote, and generously used to help set up the new
builder, seems to assume an older Git install,
and thus a path was broken which thankfully to gitlab led to the brilliant
error of just stalling.
Next up, because we use msys2's pacman to provision the windows builders,
and pacman essentially gives us
symbols for packages to install, we ended up getting a newer autoconf onto
the new builder (and I assume this
will happen with any other builders we add as well). This new autoconf
(which I've also ran into on the M1s) doesn't
like our configure.ac/aclocal.m4 anymore and barfs; I wasn't able to figure
out how to force pacman to install an
older version and *not* give it some odd version suffix (which prevents it
from working as a drop in replacement).

In any case we *must* update our autoconf files. So I guess the time is now.


On Wed, Feb 17, 2021 at 6:58 PM Moritz Angermann <moritz.angermann at gmail.com>
wrote:

> At this point I believe we have ample Linux build capacity. Darwin looks
> pretty good as well the ~4 M1s we have should in principle also be able to
> build x86_64-darwin at acceptable speeds. Although on Big Sur only.
>
> The aarch64-Linux story is a bit constraint by powerful and fast CI
> machines but probabaly bearable for the time being. I doubt anyone really
> looks at those jobs anyway as they are permitted to fail. If aarch64 would
> become a bottle neck, I’d be inclined to just disable them. With the NCG
> soon this will likely become much more bearable as wel, even though we
> might want to run the nightly llvm builds.
>
> To be frank, I don’t see 9.2 happening in two weeks with the current CI.
>
> If we subtract aarch64-linux and windows builds we could probably do a
> full run in less than three hours maybe even less. And that is mostly
> because we have a serialized pipeline. I have discussed some ideas with Ben
> on prioritizing the first few stages by the faster ci machines to
> effectively fail fast and provide feedback.
>
> But yes. Working on ghc right now is quite painful due to long and
> unpredictable CI times.
>
> Cheers,
>  Moritz
>
> On Wed, 17 Feb 2021 at 6:31 PM, Sebastian Graf <sgraf1337 at gmail.com>
> wrote:
>
>> Hi Moritz,
>>
>> I, too, had my gripes with CI turnaround times in the past. Here's a
>> somewhat radical proposal:
>>
>>    - Run "full-build" stage builds only on Marge MRs. Then we can assign
>>    to Marge much earlier, but probably have to do a bit more of (manual)
>>    bisecting of spoiled Marge batches.
>>       - I hope this gets rid of a bit of the friction of small MRs. I
>>       recently caught myself wanting to do a bunch of small, independent, but
>>       related changes as part of the same MR, simply because it's such a hassle
>>       to post them in individual MRs right now and also because it steals so much
>>       CI capacity.
>>    - Regular MRs should still have the ability to easily run individual
>>    builds of what is now the "full-build" stage, similar to how we can run
>>    optional "hackage" builds today. This is probably useful to pin down the
>>    reason for a spoiled Marge batch.
>>    - The CI capacity we free up can probably be used to run a perf build
>>    (such as the fedora release build) on the "build" stage (the one where we
>>    currently run stack-hadrian-build and the validate-deb9-hadrian build), in
>>    parallel.
>>    - If we decide against the latter, a micro-optimisation could be to
>>    cache the build artifacts of the "lint-base" build and continue the build
>>    in the validate-deb9-hadrian build of the "build" stage.
>>
>> The usefulness of this approach depends on how many MRs cause metric
>> changes on different architectures.
>>
>> Another frustrating aspect is that if you want to merge an n-sized chain
>> of dependent changes individually, you have to
>>
>>    - Open an MR for each change (initially the last change will be
>>    comprised of n commits)
>>    - Review first change, turn pipeline green   (A)
>>    - Assign to Marge, wait for batch to be merged   (B)
>>    - Review second change, turn pipeline green
>>    - Assign to Marge, wait for batch to be merged
>>    - ... and so on ...
>>
>> Note that (A) incurs many context switches for the dev and the latency of
>> *at least* one run of CI.
>> And then (B) incurs the latency of *at least* one full-build, if you're
>> lucky and the batch succeeds. I've recently seen batches that were
>> resubmitted by Marge at least 5 times due to spurious CI failures and
>> timeouts. I think this is a huge factor for latency.
>>
>> Although after (A), I should just pop the the patch off my mental stack,
>> that isn't particularly true, because Marge keeps on reminding me when a
>> stack fails or succeeds, both of which require at least some attention from
>> me: Failed 2 times => Make sure it was spurious, Succeeds => Rebase next
>> change.
>>
>> Maybe we can also learn from other projects like Rust, GCC or clang,
>> which I haven't had a look at yet.
>>
>> Cheers,
>> Sebastian
>>
>> Am Mi., 17. Feb. 2021 um 09:11 Uhr schrieb Moritz Angermann <
>> moritz.angermann at gmail.com>:
>>
>>> Friends,
>>>
>>> I've been looking at CI recently again, as I was facing CI turnaround
>>> times of 9-12hs; and this just keeps dragging out and making progress hard.
>>>
>>> The pending pipeline currently has 2 darwin, and 15 windows builds
>>> waiting. Windows builds on average take ~220minutes. We have five builders,
>>> so we can expect this queue to be done in ~660 minutes assuming perfect
>>> scheduling and good performance. That is 11hs! The next windows build can
>>> be started in 11hs. Please check my math and tell me I'm wrong!
>>>
>>> If you submit a MR today, with some luck, you'll be able to know if it
>>> will be mergeable some time tomorrow. At which point you can assign it to
>>> marge, and marge, if you are lucky and the set of patches she tries to
>>> merge together is mergeable, will merge you work into master probably some
>>> time on Friday. If a job fails, well you have to start over again.
>>>
>>> What are our options here? Ben has been pretty clear about not wanting a
>>> broken commit for windows to end up in the tree, and I'm there with him.
>>>
>>> Cheers,
>>>  Moritz
>>>
>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20210218/3cb5dc1b/attachment.html>


More information about the ghc-devs mailing list