On CI

Wed Mar 17 10:18:49 UTC 2021

I am not advocating to drop perf tests during merge requests, I just want
them to not be fatal for marge batches. Yes this means that a bunch of
unrelated merge requests all could be fine wrt to the perf checks per merge
request, but the aggregate might fail perf.  And then subsequently the next
MR against the merged aggregate will start failing. Even that is a pretty
bad situation imo.

I honestly don't have a good answer, I just see marge work on batches, over
and over and over again, just to fail. Eventually marge should figure out a
subset of the merges that fit into the perf window, but that might be after
10 tries? So after up to ~30+hours?, which means there won't be any merge
request landing in GHC for 30hs. I find that rather unacceptable.

I think we need better visualisation of perf regressions that happen on
master. Ben has some wip for this, and I think John said there might be
some way to add a nice (maybe reflex) ui to it.  If we can see regressions
on master easily, and go from "ohh this point in time GHC got worse", to
"this is the commit". We might be able to figure it out.

But what do we expect of patch authors? Right now if five people write
patches to GHC, and each of them eventually manage to get their MRs green,
after a long review, they finally see it assigned to marge, and then it
starts failing? Their patch on its own was fine, but their aggregate with
other people's code leads to regressions? So we now expect all patch
authors together to try to figure out what happened? Figuring out why
something regressed is hard enough, and we only have a very few people who
are actually capable of debugging this. Thus I believe it would end up with
Ben, Andreas, Matthiew, Simon, ... or someone else from GHC HQ anyway to
figure out why it regressed, be it in the Review Stage, or dissecting a
marge aggregate, or on master.

Thus I believe in most cases we'd have to look at the regressions anyway,
and right now we just convolutedly make working on GHC a rather depressing
job. Increasing the barrier to entry by also requiring everyone to have
absolutely stellar perf regression skills is quite a challenge.

There is also the question of our synthetic benchmarks actually measuring
real world performance? Do the micro benchmarks translate to the same
regressions in say building aeson, vector or Cabal? The latter being what
most practitioners care about more than the micro benchmarks.

Again, I'm absolutely not in favour of GHC regressing, it's slow enough as
it is. I just think CI should be assisting us and not holding development
back.

Cheers,
 Moritz

On Wed, Mar 17, 2021 at 5:54 PM Spiwack, Arnaud <arnaud.spiwack at tweag.io>
wrote:

> Ah, so it was really two identical pipelines (one for the branch where
> Margebot batches commits, and one for the MR that Margebot creates before
> merging). That's indeed a non-trivial amount of purely wasted
> computer-hours.
>
> Taking a step back, I am inclined to agree with the proposal of not
> checking stat regressions in Margebot. My high-level opinion on this is
> that perf tests don't actually test the right thing. Namely, they don't
> prevent performance drift over time (if a given test is allowed to degrade
> by 2% every commit, it can take a 100% performance hit in just 35 commits).
> While it is important to measure performance, and to avoid too egregious
> performance degradation in a given commit, it's usually performance over
> time which matters. I don't really know how to apply it to collaborative
> development, and help maintain healthy performance. But flagging
> performance regressions in MRs, while not making them block batched merges
> sounds like a reasonable compromise.
>
>
> On Wed, Mar 17, 2021 at 9:34 AM Moritz Angermann <
> moritz.angermann at gmail.com> wrote:
>
>> *why* is a very good question. The MR fixing it is here:
>> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5275
>>
>> On Wed, Mar 17, 2021 at 4:26 PM Spiwack, Arnaud <arnaud.spiwack at tweag.io>
>> wrote:
>>
>>> Then I have a question: why are there two pipelines running on each
>>> merge batch?
>>>
>>> On Wed, Mar 17, 2021 at 9:22 AM Moritz Angermann <
>>> moritz.angermann at gmail.com> wrote:
>>>
>>>> No it wasn't. It was about the stat failures described in the next
>>>> paragraph. I could have been more clear about that. My apologies!
>>>>
>>>> On Wed, Mar 17, 2021 at 4:14 PM Spiwack, Arnaud <
>>>> arnaud.spiwack at tweag.io> wrote:
>>>>
>>>>>
>>>>> and if either of both (see below) failed, marge's merge would fail as
>>>>>> well.
>>>>>>
>>>>>
>>>>> Re: “see below” is this referring to a missing part of your email?
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20210317/d6a77cf1/attachment.html>