On CI

Wed Mar 17 03:00:14 UTC 2021

Hi there!

Just a quick update on our CI situation. Ben, John, Davean and I have been
discussion on CI yesterday, and what we can do about it, as well as some
minor notes on why we are frustrated with it. This is an open invitation to
anyone who in earnest wants to work on CI. Please come forward and help!
We'd be glad to have more people involved!

First the good news, over the last few weeks we've seen we *can* improve
CI performance quite substantially. And the goal is now to have MR go
through
CI within at most 3hs.  There are some ideas on how to make this even
faster,
especially on wide (high core count) machines; however that will take a bit
more
time.

Now to the more thorny issue: Stat failures.  We do not want GHC to regress,
and I believe everyone is on board with that mission.  Yet we have just
witnessed a train of marge trials all fail due to a -2% regression in a few
tests. Thus we've been blocking getting stuff into master for at least
another day. This is (in my opinion) not acceptable! We just had five days
of nothing working because master was broken and subsequently all CI
pipelines kept failing. We have thus effectively wasted a week. While we
can mitigate the latter part by enforcing marge for all merges to master
(and with faster pipeline turnaround times this might be more palatable
than with 9-12h turnaround times -- when you need to get something done!
ha!), but that won't help us with issues where marge can't find a set of
buildable MRs, because she just keeps hitting a combination of MRs that
somehow together increase or decrease metrics.

We have three knobs to adjust:
- Make GHC build faster / make the testsuite run faster.
  There is some rather interesting work going on about parallelizing
(earlier)
  during builds. We've also seen that we've wasted enormous amounts of
  time during darwin builds in the kernel, because of a bug in the
testdriver.
- Use faster hardware.
  We've seen that just this can cut windows build times from 220min to
80min.
- Reduce the amount of builds.
  We used to build two pipelines for each marge merge, and if either of both
  (see below) failed, marge's merge would fail as well. So not only did we
build
  twice as much as we needed, we also increased our chances to hit bogous
  build failures by 2.

We need to do something about this, and I'd advocate for just not making
stats fail with marge. Build errors of course, but stat failures, no. And
then have a separate dashboard (and Ben has some old code lying around for
this, which someone would need to pick up and polish, ...), that tracks
GHC's Performance for each commit to master, with easy access from the
dashboard to the offending commit. We will also need to consider the
implications of synthetic micro benchmarks, as opposed to say building
Cabal or other packages, that reflect more real-world experience of users
using GHC.

I will try to provide a data driven report on GHC's CI on a bi-weekly or
month (we will have to see what the costs for writing it up, and the
usefulness is) going forward. And my sincere hope is that it will help us
better understand our CI situation; instead of just having some vague
complaints about it.

Cheers,
 Moritz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20210317/fcb867af/attachment.html>