john.ericson at obsidian.systems
Wed Mar 17 15:06:28 UTC 2021
Yes, I think the counter point of "automating what Ben does" so people
besides Ben can do it is very important. In this case, I think a good
thing we could do is asynchronously build more of master post-merge,
such as use the perf stats to automatically bisect anything that is
fishy, including within marge bot roll-ups which wouldn't be built by
the regular workflow anyways.
I also agree with Sebastian that the overfit/overly-synthetic nature of
our current tests + the sketchy way we ignored drift makes the current
approach worth abandoning in any event. The fact that the gold standard
must include tests of larger, "real world" code, which unfortunately
takes longer to build, I also think is a point towards this asynchronous
approach: We trade MR latency for stat latency, but better utilize our
build machines and get better stats, and when a human is to fix
something a few days later, they have a much better foundation to start
Finally I agree with SPJ that for fairness and sustainability's sake,
the person investigating issues after the fact should ideally be the MR
authors, and definitely definitely not Ben. But I hope that better
stats, nice looking graphs, and maybe a system to automatically ping MR
authors, will make the perf debugging much more accessible enabling that
On 3/17/21 9:47 AM, Sebastian Graf wrote:
> Re: Performance drift: I opened
> <https://gitlab.haskell.org/ghc/ghc/-/issues/17658> a while ago with
> an idea of how to measure drift a bit better.
> It's basically an automatically checked version of "Ben stares at
> performance reports every two weeks and sees that T9872 has regressed
> by 10% since 9.0"
> Maybe we can have Marge check for drift and each individual MR for
> incremental perf regressions?
> Am Mi., 17. März 2021 um 14:40 Uhr schrieb Richard Eisenberg
> <rae at richarde.dev <mailto:rae at richarde.dev>>:
>> On Mar 17, 2021, at 6:18 AM, Moritz Angermann
>> <moritz.angermann at gmail.com <mailto:moritz.angermann at gmail.com>>
>> But what do we expect of patch authors? Right now if five people
>> write patches to GHC, and each of them eventually manage to get
>> their MRs green, after a long review, they finally see it
>> assigned to marge, and then it starts failing? Their patch on its
>> own was fine, but their aggregate with other people's code leads
>> to regressions? So we now expect all patch authors together to
>> try to figure out what happened? Figuring out why something
>> regressed is hard enough, and we only have a very few people who
>> are actually capable of debugging this. Thus I believe it would
>> end up with Ben, Andreas, Matthiew, Simon, ... or someone else
>> from GHC HQ anyway to figure out why it regressed, be it in the
>> Review Stage, or dissecting a marge aggregate, or on master.
> I have previously posted against the idea of allowing Marge to
> accept regressions... but the paragraph above is sadly convincing.
> Maybe Simon is right about opening up the windows to, say, be 100%
> (which would catch a 10x regression) instead of infinite, but I'm
> now convinced that Marge should be very generous in allowing
> regressions -- provided we also have some way of monitoring drift
> over time.
> Separately, I've been concerned for some time about the
> peculiarity of our perf tests. For example, I'd be quite happy to
> accept a 25% regression on T9872c if it yielded a 1% improvement
> on compiling Cabal. T9872 is very very very strange! (Maybe if
> *all* the T9872 tests regressed, I'd be more worried.) I would be
> very happy to learn that some more general, representative tests
> are included in our examinations.
> ghc-devs mailing list
> ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
> ghc-devs mailing list
> ghc-devs at haskell.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ghc-devs