On CI

Spiwack, Arnaud arnaud.spiwack at tweag.io
Mon Feb 22 13:42:40 UTC 2021


Let me know if I'm talking nonsense, but I believe that we are building
both stages for each architecture and flavour. Do we need to build two
stages everywhere? What stops us from building a single stage? And if
anything, what can we change to get into a situation where we can?

Quite better than reusing build incrementally, is not building at all.

On Mon, Feb 22, 2021 at 10:09 AM Simon Peyton Jones via ghc-devs <
ghc-devs at haskell.org> wrote:

> Incremental CI can cut multiple hours to < mere minutes, especially with
> the test suite being embarrassingly parallel. There simply no way
> optimizations to the compiler independent from sharing a cache between CI
> runs can get anywhere close to that return on investment.
>
> I rather agree with this.  I don’t think there is much low-hanging fruit
> on compile times, aside from coercion-zapping which we are working on
> anyway.  If we got a 10% reduction in compile time we’d be over the moon,
> but our users would barely notice.
>
>
>
> To get truly substantial improvements (a factor of 2 or 10) I think we
> need to do less compiling – hence incremental CI.
>
>
> Simon
>
>
>
> *From:* ghc-devs <ghc-devs-bounces at haskell.org> *On Behalf Of *John
> Ericson
> *Sent:* 22 February 2021 05:53
> *To:* ghc-devs <ghc-devs at haskell.org>
> *Subject:* Re: On CI
>
>
>
> I'm not opposed to some effort going into this, but I would strongly
> opposite putting all our effort there. Incremental CI can cut multiple
> hours to < mere minutes, especially with the test suite being
> embarrassingly parallel. There simply no way optimizations to the compiler
> independent from sharing a cache between CI runs can get anywhere close to
> that return on investment.
>
> (FWIW, I'm also skeptical that the people complaining about GHC
> performance know what's hurting them most. For example, after
> non-incrementality, the next slowest thing is linking, which is...not done
> by GHC! But all that is a separate conversation.)
>
> John
>
> On 2/19/21 2:42 PM, Richard Eisenberg wrote:
>
> There are some good ideas here, but I want to throw out another one: put
> all our effort into reducing compile times. There is a loud plea to do this
> on Discourse
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdiscourse.haskell.org%2Ft%2Fcall-for-ideas-forming-a-technical-agenda%2F1901%2F24&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691120329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1CV0MEVUZpbAbmKAWTIiqLgjft7IbN%2BCSnvB3W3iX%2FU%3D&reserved=0>,
> and it would both solve these CI problems and also help everyone else.
>
>
>
> This isn't to say to stop exploring the ideas here. But since time is
> mostly fixed, tackling compilation times in general may be the best way out
> of this. Ben's survey of other projects (thanks!) shows that we're way, way
> behind in how long our CI takes to run.
>
>
>
> Richard
>
>
>
> On Feb 19, 2021, at 7:20 AM, Sebastian Graf <sgraf1337 at gmail.com> wrote:
>
>
>
> Recompilation avoidance
>
>
>
> I think in order to cache more in CI, we first have to invest some time in
> fixing recompilation avoidance in our bootstrapped build system.
>
>
>
> I just tested on a hadrian perf ticky build: Adding one line of *comment*
> in the compiler causes
>
>    - a (pretty slow, yet negligible) rebuild of the stage1 compiler
>    - 2 minutes of RTS rebuilding (Why do we have to rebuild the RTS? It
>    doesn't depend in any way on the change I made)
>    - apparent full rebuild the libraries
>    - apparent full rebuild of the stage2 compiler
>
> That took 17 minutes, a full build takes ~45minutes. So there definitely
> is some caching going on, but not nearly as much as there could be.
>
> I know there have been great and boring efforts on compiler determinism in
> the past, but either it's not good enough or our build system needs fixing.
>
> I think a good first step to assert would be to make sure that the hash of
> the stage1 compiler executable doesn't change if I only change a comment.
>
> I'm aware there probably is stuff going on, like embedding configure dates
> in interface files and executables, that would need to go, but if possible
> this would be a huge improvement.
>
>
>
> On the other hand, we can simply tack on a [skip ci] to the commit
> message, as I did for
> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fmerge_requests%2F4975&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691130329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bgT0LeZXjF%2BMklzctvZL6WaVpaddN7%2FSpojcEXGXv7Q%3D&reserved=0>.
> Variants like [skip tests] or [frontend] could help to identify which tests
> to run by default.
>
>
>
> Lean
>
>
>
> I had a chat with a colleague about how they do CI for Lean. Apparently,
> CI turnaround time including tests is generally 25 minutes (~15 minutes for
> the build) for a complete pipeline, testing 6 different OSes and
> configurations in parallel:
> https://github.com/leanprover/lean4/actions/workflows/ci.yml
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fleanprover%2Flean4%2Factions%2Fworkflows%2Fci.yml&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9MEWPlRhO2xZK2iu5OqzXS9RZqc9pKNJcGDv7Nj3hyA%3D&reserved=0>
>
> They utilise ccache to cache the clang-based C++-backend, so that they
> only have to re-run the front- and middle-end. In effect, they take
> advantage of the fact that the "function" clang, in contrast to the
> "function" stage1 compiler, stays the same.
>
> It's hard to achieve that for GHC, where a complete compiler pipeline
> comes as one big, fused "function": An external tool can never be certain
> that a change to Parser.y could not affect the CodeGen phase.
>
>
>
> Inspired by Lean, the following is a bit inconcrete and imaginary, but
> maybe we could make it so that compiler phases "sign" parts of the
> interface file with the binary hash of the respective subcomponents of the
> phase?
>
> E.g., if all the object files that influence CodeGen (that will later be
> linked into the stage1 compiler) result in a hash of 0xdeadbeef before and
> after the change to Parser.y, we know we can stop recompiling Data.List
> with the stage1 compiler when we see that the IR passed to CodeGen didn't
> change, because the last compile did CodeGen with a stage1 compiler with
> the same hash 0xdeadbeef. The 0xdeadbeef hash is a proxy for saying "the
> function CodeGen stayed the same", so we can reuse its cached outputs.
>
> Of course, that is utopic without a tool that does the "taint analysis" of
> which modules in GHC influence CodeGen. Probably just including all the
> transitive dependencies of GHC.CmmToAsm suffices, but probably that's too
> crude already. For another example, a change to GHC.Utils.Unique would
> probably entail a full rebuild of the compiler because it basically affects
> all compiler phases.
>
> There are probably parallels with recompilation avoidance in a language
> with staged meta-programming.
>
>
>
> Am Fr., 19. Feb. 2021 um 11:42 Uhr schrieb Josef Svenningsson via ghc-devs
> <ghc-devs at haskell.org>:
>
> Doing "optimistic caching" like you suggest sounds very promising. A way
> to regain more robustness would be as follows.
>
> If the build fails while building the libraries or the stage2 compiler,
> this might be a false negative due to the optimistic caching. Therefore,
> evict the "optimistic caches" and restart building the libraries. That way
> we can validate that the build failure was a true build failure and not
> just due to the aggressive caching scheme.
>
>
>
> Just my 2p
>
>
>
> Josef
>
>
> ------------------------------
>
> *From:* ghc-devs <ghc-devs-bounces at haskell.org> on behalf of Simon Peyton
> Jones via ghc-devs <ghc-devs at haskell.org>
> *Sent:* Friday, February 19, 2021 8:57 AM
> *To:* John Ericson <john.ericson at obsidian.systems>; ghc-devs <
> ghc-devs at haskell.org>
> *Subject:* RE: On CI
>
>
>
>    1. Building and testing happen together. When tests failure
>    spuriously, we also have to rebuild GHC in addition to re-running the
>    tests. That's pure waste.
>    https://gitlab.haskell.org/ghc/ghc/-/issues/13897
>    <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Nm6vfgGLLlJpiGa8XKxI6kNkBetp8ZZLPZS8hF%2BydrM%3D&reserved=0>
>    tracks this more or less.
>
> I don’t get this.  We have to build GHC before we can test it, don’t we?
>
> 2 .  We don't cache between jobs.
>
> This is, I think, the big one.   We endlessly build the exact same
> binaries.
>
> There is a problem, though.  If we make **any** change in GHC, even a
> trivial refactoring, its binary will change slightly.  So now any caching
> build system will assume that anything built by that GHC must be rebuilt –
> we can’t use the cached version.  That includes all the libraries and the
> stage2 compiler.  So caching can save all the preliminaries (building the
> initial Cabal, and large chunk of stage1, since they are built with the
> same bootstrap compiler) but after that we are dead.
>
> I don’t know any robust way out of this.  That small change in the source
> code of GHC might be trivial refactoring, or it might introduce a critical
> mis-compilation which we really want to see in its build products.
>
> However, for smoke-testing MRs, on every architecture, we could perhaps
> cut corners.  (Leaving Marge to do full diligence.)  For example, we could
> declare that if we have the result of compiling library module X.hs with
> the stage1 GHC in the last full commit in master, then we can re-use that
> build product rather than compiling X.hs with the MR’s slightly modified
> stage1 GHC.  That **might** be wrong; but it’s usually right.
>
> Anyway, there are big wins to be had here.
>
> Simon
>
>
>
>
>
>
>
> *From:* ghc-devs <ghc-devs-bounces at haskell.org> *On Behalf Of *John
> Ericson
> *Sent:* 19 February 2021 03:19
> *To:* ghc-devs <ghc-devs at haskell.org>
> *Subject:* Re: On CI
>
>
>
> I am also wary of us to deferring checking whole platforms and what not. I
> think that's just kicking the can down the road, and will result in more
> variance and uncertainty. It might be alright for those authoring PRs, but
> it will make Ben's job keeping the system running even more grueling.
>
> Before getting into these complex trade-offs, I think we should focus on
> the cornerstone issue that CI isn't incremental.
>
>    1. Building and testing happen together. When tests failure
>    spuriously, we also have to rebuild GHC in addition to re-running the
>    tests. That's pure waste.
>    https://gitlab.haskell.org/ghc/ghc/-/issues/13897
>    <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691150320%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=KlQGW1amK%2BtlRTGl4cDgMyl%2Bfz17fuUAHFNAaNXbzZI%3D&reserved=0>
>    tracks this more or less.
>    2. We don't cache between jobs. Shake and Make do not enforce
>    dependency soundness, nor cache-correctness when the build plan itself
>    changes, and this had made this hard/impossible to do safely. Naively this
>    only helps with stage 1 and not stage 2, but if we have separate stage 1
>    and --freeze1 stage 2 builds, both can be incremental. Yes, this is also
>    lossy, but I only see it leading to false failures not false acceptances
>    (if we can also test the stage 1 one), so I consider it safe. MRs that only
>    work with a slow full build because ABI can so indicate.
>
> The second, main part is quite hard to tackle, but I strongly believe
> incrementality is what we need most, and what we should remain focused on.
>
> John
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691160313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uE1IOblLTYJ2j3H2vkFKgQyVZs5sehXd1Tl70X0kUqE%3D&reserved=0>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691160313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uE1IOblLTYJ2j3H2vkFKgQyVZs5sehXd1Tl70X0kUqE%3D&reserved=0>
>
>
>
>
>
> _______________________________________________
>
> ghc-devs mailing list
>
> ghc-devs at haskell.org
>
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691170308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Yrob9grqAWOxZnFXcM%2BZ60VNsrhIejcmwkSIR3Wq0gA%3D&reserved=0>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20210222/5f9dc61d/attachment.html>


More information about the ghc-devs mailing list