<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Yes, I think the counter point of "automating what Ben does" so
people besides Ben can do it is very important. In this case, I
think a good thing we could do is asynchronously build more of
master post-merge, such as use the perf stats to automatically
bisect anything that is fishy, including within marge bot roll-ups
which wouldn't be built by the regular workflow anyways. <br>
</p>
<p>I also agree with Sebastian that the overfit/overly-synthetic
nature of our current tests + the sketchy way we ignored drift
makes the current approach worth abandoning in any event. The fact
that the gold standard must include tests of larger, "real world"
code, which unfortunately takes longer to build, I also think is a
point towards this asynchronous approach: We trade MR latency for
stat latency, but better utilize our build machines and get better
stats, and when a human is to fix something a few days later, they
have a much better foundation to start their investigation.</p>
<p>Finally I agree with SPJ that for fairness and sustainability's
sake, the person investigating issues after the fact should
ideally be the MR authors, and definitely definitely not Ben. But
I hope that better stats, nice looking graphs, and maybe a system
to automatically ping MR authors, will make the perf debugging
much more accessible enabling that goal.<br>
</p>
<p>John<br>
</p>
<div class="moz-cite-prefix">On 3/17/21 9:47 AM, Sebastian Graf
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAAS+=P8U6PCqaN7ZC-15rJeTjp5rV-VQReKp_2tL6pCcmeEmWw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div>Re: Performance drift: I opened <a
href="https://gitlab.haskell.org/ghc/ghc/-/issues/17658"
moz-do-not-send="true">https://gitlab.haskell.org/ghc/ghc/-/issues/17658</a>
a while ago with an idea of how to measure drift a bit better.</div>
<div>It's basically an automatically checked version of "Ben
stares at performance reports every two weeks and sees that
T9872 has regressed by 10% since 9.0"</div>
<div><br>
</div>
<div>Maybe we can have Marge check for drift and each individual
MR for incremental perf regressions?<br>
</div>
<br>
<div>Sebastian<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Am Mi., 17. März 2021 um
14:40 Uhr schrieb Richard Eisenberg <<a
href="mailto:rae@richarde.dev" moz-do-not-send="true">rae@richarde.dev</a>>:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="overflow-wrap: break-word;"><br>
<div><br>
<blockquote type="cite">
<div>On Mar 17, 2021, at 6:18 AM, Moritz Angermann <<a
href="mailto:moritz.angermann@gmail.com"
target="_blank" moz-do-not-send="true">moritz.angermann@gmail.com</a>>
wrote:</div>
<br>
<div><span
style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">But
what do we expect of patch authors? Right now if
five people write patches to GHC, and each of them
eventually manage to get their MRs green, after a
long review, they finally see it assigned to marge,
and then it starts failing? Their patch on its own
was fine, but their aggregate with other people's
code leads to regressions? So we now expect all
patch authors together to try to figure out what
happened? Figuring out why something regressed is
hard enough, and we only have a very few people who
are actually capable of debugging this. Thus I
believe it would end up with Ben, Andreas, Matthiew,
Simon, ... or someone else from GHC HQ anyway to
figure out why it regressed, be it in the Review
Stage, or dissecting a marge aggregate, or on
master.</span></div>
</blockquote>
</div>
<br>
<div>I have previously posted against the idea of allowing
Marge to accept regressions... but the paragraph above is
sadly convincing. Maybe Simon is right about opening up
the windows to, say, be 100% (which would catch a 10x
regression) instead of infinite, but I'm now convinced
that Marge should be very generous in allowing regressions
-- provided we also have some way of monitoring drift over
time.</div>
<div><br>
</div>
<div>Separately, I've been concerned for some time about the
peculiarity of our perf tests. For example, I'd be quite
happy to accept a 25% regression on T9872c if it yielded a
1% improvement on compiling Cabal. T9872 is very very very
strange! (Maybe if *all* the T9872 tests regressed, I'd be
more worried.) I would be very happy to learn that some
more general, representative tests are included in our
examinations.</div>
<div><br>
</div>
<div>Richard</div>
</div>
_______________________________________________<br>
ghc-devs mailing list<br>
<a href="mailto:ghc-devs@haskell.org" target="_blank"
moz-do-not-send="true">ghc-devs@haskell.org</a><br>
<a
href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a><br>
</blockquote>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
ghc-devs mailing list
<a class="moz-txt-link-abbreviated" href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a>
<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a>
</pre>
</blockquote>
</body>
</html>