<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>I'm not opposed to some effort going into this, but I would
      strongly opposite putting all our effort there. Incremental CI can
      cut multiple hours to < mere minutes, especially with the test
      suite being embarrassingly parallel. There simply no way
      optimizations to the compiler independent from sharing a cache
      between CI runs can get anywhere close to that return on
      investment.</p>
    <p>(FWIW, I'm also skeptical that the people complaining about GHC
      performance know what's hurting them most. For example, after
      non-incrementality, the next slowest thing is linking, which
      is...not done by GHC! But all that is a separate conversation.)</p>
    <p>John<br>
    </p>
    <div class="moz-cite-prefix">On 2/19/21 2:42 PM, Richard Eisenberg
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:010f0177bbd109e4-98654d12-c4d5-442b-a55a-d4228b00b0d3-000000@us-east-2.amazonses.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      There are some good ideas here, but I want to throw out another
      one: put all our effort into reducing compile times. There is a
      loud plea to do this on <a
href="https://discourse.haskell.org/t/call-for-ideas-forming-a-technical-agenda/1901/24"
        class="" moz-do-not-send="true">Discourse</a>, and it would both
      solve these CI problems and also help everyone else.
      <div class=""><br class="">
      </div>
      <div class="">This isn't to say to stop exploring the ideas here.
        But since time is mostly fixed, tackling compilation times in
        general may be the best way out of this. Ben's survey of other
        projects (thanks!) shows that we're way, way behind in how long
        our CI takes to run.</div>
      <div class=""><br class="">
      </div>
      <div class="">Richard<br class="">
        <div><br class="">
          <blockquote type="cite" class="">
            <div class="">On Feb 19, 2021, at 7:20 AM, Sebastian Graf
              <<a href="mailto:sgraf1337@gmail.com" class=""
                moz-do-not-send="true">sgraf1337@gmail.com</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <div dir="ltr" class="">
                <div class=""><font class="" size="4">Recompilation
                    avoidance</font><br class="">
                </div>
                <div class=""><br class="">
                </div>
                <div class="">I think in order to cache more in CI, we
                  first have to invest some time in fixing recompilation
                  avoidance in our bootstrapped build system.<br
                    class="">
                </div>
                <div class=""><br class="">
                </div>
                <div class="">I just tested on a hadrian perf ticky
                  build: Adding one line of *comment* in the compiler
                  causes</div>
                <div class="">
                  <ul class="">
                    <li class="">a (pretty slow, yet negligible) rebuild
                      of the stage1 compiler</li>
                    <li class="">2 minutes of RTS rebuilding (Why do we
                      have to rebuild the RTS? It doesn't depend in any
                      way on the change I made)<br class="">
                    </li>
                    <li class="">apparent full rebuild the libraries</li>
                    <li class="">apparent full rebuild of the stage2
                      compiler</li>
                  </ul>
                  <div class="">That took 17 minutes, a full build takes
                    ~45minutes. So there definitely is some caching
                    going on, but not nearly as much as there could be.</div>
                  <div class="">I know there have been great and boring
                    efforts on compiler determinism in the past, but
                    either it's not good enough or our build system
                    needs fixing.</div>
                  <div class="">I think a good first step to assert
                    would be to make sure that the hash of the stage1
                    compiler executable doesn't change if I only change
                    a comment.</div>
                  <div class="">I'm aware there probably is stuff going
                    on, like embedding configure dates in interface
                    files and executables, that would need to go, but if
                    possible this would be a huge improvement.</div>
                  <div class=""><br class="">
                  </div>
                  <div class="">On the other hand, we can simply tack on
                    a [skip ci] to the commit message, as I did for <a
                      href="https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975"
                      class="" moz-do-not-send="true">https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975</a>.
                    Variants like [skip tests] or [frontend] could help
                    to identify which tests to run by default.</div>
                  <div class=""><br class="">
                  </div>
                  <div class=""><font class="" size="4">Lean</font><br
                      class="">
                  </div>
                  <div class=""><br class="">
                  </div>
                  I had a chat with a colleague about how they do CI for
                  Lean. Apparently, CI turnaround time including tests
                  is generally 25 minutes (~15 minutes for the build)
                  for a complete pipeline, testing 6 different OSes and
                  configurations in parallel: <a
                    href="https://github.com/leanprover/lean4/actions/workflows/ci.yml"
                    class="" moz-do-not-send="true">https://github.com/leanprover/lean4/actions/workflows/ci.yml</a></div>
                <div class="">They utilise ccache to cache the
                  clang-based C++-backend, so that they only have to
                  re-run the front- and middle-end. In effect, they take
                  advantage of the fact that the "function" clang, in
                  contrast to the "function" stage1 compiler, stays the
                  same.</div>
                <div class="">It's hard to achieve that for GHC, where a
                  complete compiler pipeline comes as one big, fused
                  "function": An external tool can never be certain that
                  a change to Parser.y could not affect the CodeGen
                  phase.</div>
                <div class=""><br class="">
                </div>
                <div class="">Inspired by Lean, the following is a bit
                  inconcrete and imaginary, but maybe we could make it
                  so that compiler phases "sign" parts of the interface
                  file with the binary hash of the respective
                  subcomponents of the phase?</div>
                <div class="">E.g., if all the object files that
                  influence CodeGen (that will later be linked into the
                  stage1 compiler) result in a hash of 0xdeadbeef before
                  and after the change to Parser.y, we know we can stop
                  recompiling Data.List with the stage1 compiler when we
                  see that the IR passed to CodeGen didn't change,
                  because the last compile did CodeGen with a stage1
                  compiler with the same hash 0xdeadbeef. The 0xdeadbeef
                  hash is a proxy for saying "the function CodeGen
                  stayed the same", so we can reuse its cached outputs.</div>
                <div class="">Of course, that is utopic without a tool
                  that does the "taint analysis" of which modules in GHC
                  influence CodeGen. Probably just including all the
                  transitive dependencies of GHC.CmmToAsm suffices, but
                  probably that's too crude already. For another
                  example, a change to GHC.Utils.Unique would probably
                  entail a full rebuild of the compiler because it
                  basically affects all compiler phases.</div>
                <div class="">There are probably parallels with
                  recompilation avoidance in a language with staged
                  meta-programming.<br class="">
                </div>
              </div>
              <br class="">
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">Am Fr., 19. Feb. 2021
                  um 11:42 Uhr schrieb Josef Svenningsson via ghc-devs
                  <<a href="mailto:ghc-devs@haskell.org" class=""
                    moz-do-not-send="true">ghc-devs@haskell.org</a>>:<br
                    class="">
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px 0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">
                  <div dir="ltr" class="">
                    <div id="gmail-m_8288259843833037528appendonsend"
                      style="font-family: Calibri, Arial, Helvetica,
                      sans-serif; font-size: 12pt;" class="">
                    </div>
                    <div style="font-family: Calibri, Arial, Helvetica,
                      sans-serif; font-size: 12pt;" class="">
                      Doing "optimistic caching" like you suggest sounds
                      very promising. A way to regain more robustness
                      would be as follows.</div>
                    <div style="font-family: Calibri, Arial, Helvetica,
                      sans-serif; font-size: 12pt;" class="">
                      If the build fails while building the libraries or
                      the stage2 compiler, this might be a false
                      negative due to the optimistic caching. Therefore,
                      evict the "optimistic caches" and restart building
                      the libraries. That way we can validate that the
                      build failure was a true build failure and not
                      just due to the aggressive caching scheme.</div>
                    <div style="font-family: Calibri, Arial, Helvetica,
                      sans-serif; font-size: 12pt;" class="">
                      <br class="">
                    </div>
                    <div style="font-family: Calibri, Arial, Helvetica,
                      sans-serif; font-size: 12pt;" class="">
                      Just my 2p</div>
                    <div style="font-family: Calibri, Arial, Helvetica,
                      sans-serif; font-size: 12pt;" class="">
                      <br class="">
                    </div>
                    <div style="font-family: Calibri, Arial, Helvetica,
                      sans-serif; font-size: 12pt;" class="">
                      Josef</div>
                    <div style="font-family: Calibri, Arial, Helvetica,
                      sans-serif; font-size: 12pt;" class="">
                      <br class="">
                    </div>
                    <hr style="display:inline-block;width:98%" class="">
                    <div id="gmail-m_8288259843833037528divRplyFwdMsg"
                      dir="ltr" class=""><font style="font-size:11pt"
                        class="" face="Calibri, sans-serif"><b class="">From:</b>
                        ghc-devs <<a
                          href="mailto:ghc-devs-bounces@haskell.org"
                          target="_blank" class=""
                          moz-do-not-send="true">ghc-devs-bounces@haskell.org</a>>
                        on behalf of Simon Peyton Jones via ghc-devs
                        <<a href="mailto:ghc-devs@haskell.org"
                          target="_blank" class=""
                          moz-do-not-send="true">ghc-devs@haskell.org</a>><br
                          class="">
                        <b class="">Sent:</b> Friday, February 19, 2021
                        8:57 AM<br class="">
                        <b class="">To:</b> John Ericson <<a
                          href="mailto:john.ericson@obsidian.systems"
                          class="" moz-do-not-send="true">john.ericson@obsidian.systems</a>>;
                        ghc-devs <<a
                          href="mailto:ghc-devs@haskell.org"
                          target="_blank" class=""
                          moz-do-not-send="true">ghc-devs@haskell.org</a>><br
                          class="">
                        <b class="">Subject:</b> RE: On CI</font>
                      <div class=""> </div>
                    </div>
                    <div style="overflow-wrap: break-word;" class=""
                      lang="EN-GB">
                      <div class="">
                        <ol style="margin-bottom:0cm" class="" type="1"
                          start="1">
                          <li
                            style="margin:0cm;font-size:11pt;font-family:Calibri,sans-serif"
                            class="">
                            Building and testing happen together. When
                            tests failure spuriously, we also have to
                            rebuild GHC in addition to re-running the
                            tests. That's pure waste.
                            <a
href="https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897&data=04%7C01%7Csimonpj@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000&sdata=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc=&reserved=0"
                              target="_blank" class=""
                              moz-do-not-send="true">https://gitlab.haskell.org/ghc/ghc/-/issues/13897</a>
                            tracks this more or less.</li>
                        </ol>
                        <div style="margin: 0cm; font-size: 11pt;
                          font-family: Calibri, sans-serif;" class="">
                          <span class="">I don’t get this.  We have to
                            build GHC before we can test it, don’t we?</span></div>
                        <div style="margin: 0cm 0cm 0cm 18pt; font-size:
                          11pt; font-family: Calibri, sans-serif;"
                          class="">
                          2 .  We don't cache between jobs. </div>
                        <div style="margin: 0cm; font-size: 11pt;
                          font-family: Calibri, sans-serif;" class="">
                          This is, I think, the big one.   We endlessly
                          build the exact same binaries.</div>
                        <div style="margin: 0cm; font-size: 11pt;
                          font-family: Calibri, sans-serif;" class="">
                          There is a problem, though.  If we make *<b
                            class="">any</b>* change in GHC, even a
                          trivial refactoring, its binary will change
                          slightly.  So now any caching build system
                          will assume that anything built by that GHC
                          must be rebuilt – we can’t use the cached
                          version.  That includes all the libraries and
                          the stage2 compiler.  So caching can save all
                          the preliminaries (building the initial Cabal,
                          and large chunk of stage1, since they are
                          built with the same bootstrap compiler) but
                          after that we are dead.</div>
                        <div style="margin: 0cm; font-size: 11pt;
                          font-family: Calibri, sans-serif;" class="">
                          I don’t know any robust way out of this.  That
                          small change in the source code of GHC might
                          be trivial refactoring, or it might introduce
                          a critical mis-compilation which we really
                          want to see in its build products. 
                        </div>
                        <div style="margin: 0cm; font-size: 11pt;
                          font-family: Calibri, sans-serif;" class="">
                          However, for smoke-testing MRs, on every
                          architecture, we could perhaps cut corners. 
                          (Leaving Marge to do full diligence.)  For
                          example, we could declare that if we have the
                          result of compiling library module X.hs with
                          the stage1 GHC in the last full commit in
                          master, then we can re-use that build product
                          rather than compiling X.hs with the MR’s
                          slightly modified stage1 GHC.  That *<b
                            class="">might</b>* be wrong; but it’s
                          usually right.</div>
                        <div style="margin: 0cm; font-size: 11pt;
                          font-family: Calibri, sans-serif;" class="">
                          Anyway, there are big wins to be had here.</div>
                        <div style="margin: 0cm; font-size: 11pt;
                          font-family: Calibri, sans-serif;" class="">
                          Simon</div>
                        <p style="margin:0cm 0cm 0cm
                          18pt;font-size:11pt;font-family:Calibri,sans-serif"
                          class="">
                           </p>
                        <p
                          style="margin:0cm;font-size:11pt;font-family:Calibri,sans-serif"
                          class="">
                          <span class=""> </span></p>
                        <p
                          style="margin:0cm;font-size:11pt;font-family:Calibri,sans-serif"
                          class="">
                          <span class=""> </span></p>
                        <div style="border-color:currentcolor
                          currentcolor currentcolor
                          blue;border-style:none none none
                          solid;border-width:medium medium medium
                          1.5pt;padding:0cm 0cm 0cm 4pt" class="">
                          <div class="">
                            <div style="border-color:rgb(225,225,225)
                              currentcolor
                              currentcolor;border-style:solid none
                              none;border-width:1pt medium
                              medium;padding:3pt 0cm 0cm" class="">
                              <div style="margin: 0cm; font-size: 11pt;
                                font-family: Calibri, sans-serif;"
                                class="">
                                <b class=""><span class="" lang="EN-US">From:</span></b><span
                                  class="" lang="EN-US"> ghc-devs <<a
href="mailto:ghc-devs-bounces@haskell.org" target="_blank" class=""
                                    moz-do-not-send="true">ghc-devs-bounces@haskell.org</a>>
                                  <b class="">On Behalf Of </b>John
                                  Ericson<br class="">
                                  <b class="">Sent:</b> 19 February 2021
                                  03:19<br class="">
                                  <b class="">To:</b> ghc-devs <<a
                                    href="mailto:ghc-devs@haskell.org"
                                    target="_blank" class=""
                                    moz-do-not-send="true">ghc-devs@haskell.org</a>><br
                                    class="">
                                  <b class="">Subject:</b> Re: On CI</span></div>
                            </div>
                          </div>
                          <p
                            style="margin:0cm;font-size:11pt;font-family:Calibri,sans-serif"
                            class="">
                             </p>
                          <p class="">I am also wary of us to deferring
                            checking whole platforms and what not. I
                            think that's just kicking the can down the
                            road, and will result in more variance and
                            uncertainty. It might be alright for those
                            authoring PRs, but it will make Ben's job
                            keeping the system running even more
                            grueling.</p>
                          <p class="">Before getting into these complex
                            trade-offs, I think we should focus on the
                            cornerstone issue that CI isn't incremental.</p>
                          <ol style="margin-bottom:0cm" class=""
                            type="1" start="1">
                            <li
                              style="margin:0cm;font-size:11pt;font-family:Calibri,sans-serif"
                              class="">
                              Building and testing happen together. When
                              tests failure spuriously, we also have to
                              rebuild GHC in addition to re-running the
                              tests. That's pure waste.
                              <a
href="https://nam06.safelinks.protection.outlook.com/?url=https://gitlab.haskell.org/ghc/ghc/-/issues/13897&data=04%7C01%7Csimonpj@microsoft.com%7C3d503922473f4cd0543f08d8d48522b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493018301253098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=%7C3000&sdata=FG2fyYCXbacp69Q8Il6GE0aX+7ZLNkH1u84NA/VMjQc=&reserved=0"
                                target="_blank" class=""
                                moz-do-not-send="true">https://gitlab.haskell.org/ghc/ghc/-/issues/13897</a>
                              tracks this more or less.</li>
                            <li
                              style="margin:0cm;font-size:11pt;font-family:Calibri,sans-serif"
                              class="">
                              We don't cache between jobs. Shake and
                              Make do not enforce dependency soundness,
                              nor cache-correctness when the build plan
                              itself changes, and this had made this
                              hard/impossible to do safely. Naively this
                              only helps with stage 1 and not stage 2,
                              but if we have separate stage 1 and
                              --freeze1 stage 2 builds, both can be
                              incremental. Yes, this is also lossy, but
                              I only see it leading to false failures
                              not false acceptances (if we can also test
                              the stage 1 one), so I consider it safe.
                              MRs that only work with a slow full build
                              because ABI can so indicate.</li>
                          </ol>
                          <div style="margin: 0cm; font-size: 11pt;
                            font-family: Calibri, sans-serif;" class="">
                            The second, main part is quite hard to
                            tackle, but I strongly believe
                            incrementality is what we need most, and
                            what we should remain focused on.</div>
                          <p class="">John</p>
                        </div>
                      </div>
                    </div>
                  </div>
                  _______________________________________________<br
                    class="">
                  ghc-devs mailing list<br class="">
                  <a href="mailto:ghc-devs@haskell.org" target="_blank"
                    class="" moz-do-not-send="true">ghc-devs@haskell.org</a><br
                    class="">
                  <a
                    href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs"
                    rel="noreferrer" target="_blank" class=""
                    moz-do-not-send="true">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a><br
                    class="">
                </blockquote>
              </div>
              _______________________________________________<br
                class="">
              ghc-devs mailing list<br class="">
              <a href="mailto:ghc-devs@haskell.org" class=""
                moz-do-not-send="true">ghc-devs@haskell.org</a><br
                class="">
              <a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a><br
                class="">
            </div>
          </blockquote>
        </div>
        <br class="">
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
ghc-devs mailing list
<a class="moz-txt-link-abbreviated" href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a>
<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a>
</pre>
    </blockquote>
  </body>
</html>