GHC development asks too much of the host system

Wed Jul 20 16:34:19 UTC 2022

Artem Pelenitsyn <a.pelenitsyn at gmail.com> writes:

> Thanks Ben, very interesting, especially the cloud Shake stuff.
>
>> If everyone were to use, e.g.,
>>   ghc.nix this could be largely mitigated, but this isn't the world in
>>   which we live.
>
> I don't know why you'd want everyone to use it necessarily to improve
> things. If there was a clear statement that you may elect to use
> ghc.nix and get X% speedup, that would be a good start. And then
> people can decide, based on their setup and their aversion to tools
> like Nix, etc. in general.
>
> The real issue is that currently you don't benefit much from ghc.nix
> because the main performance sink is the GHC tree itself. The way out
> is to use a cloud build system, most likely Cloud Shake, which, as you
> describe, has a couple of issues in this case:
>
> 1) Native dependencies. This should be possible to solve via Nix, but
>    unfortunately, this is not quite there yet because Shake doesn't
>    know about Nix (afaik). I think to actually get
>    there, you'd need some sort of integration between Nix and Shake akin to
>    what Tweag built for Nix and Bazel (cf. rules_nixpkgs [1]). Their
>    moto is: Nix for "external" or "system" components, and Bazel for
>    "internal" or "local" ones.

I disagree. Caching is quite feasible while maintaining a clear division
between configuration and the build system.

Today, the only channel of communication between `configure` and Hadrian
is `hadrian/cfg/system.config`. If `ghc.nix` is doing its job correctly
then two invocations of `./configure` in nix-shell on two different
machines should end up with the same `hadrian/cfg/system.config`.
Further, if two trees have the same contents in that file, then they
can share build artifacts [1].

However, to reiterate, the real problem here is the one below:

> 2) Bootstrapping aspect. Maybe this is a challenge for rebuilds after
>    modification, but I think people on this thread were quoting the
>    "time to first build" more. I don't see how avoiding to build
>    master locally after a fresh (worktree) checkout by downloading
>    build results from somewhere, connects to bootstrapping. I think it
>    doesn't.

If you merely want to build `master` then indeed caching would work
fine. However, in that case you could have also just downloaded a binary
distribution from GitLab. The problem is that usually the reason that
you want to build `master` is that you then want to *modify* it.
In general, a modification of `master` will require rebuilding some
subset of the stage1 ghc, which will then require a full build of
stage2 (which includes GHC, as well as all of the boot libraries,
`base`, etc.). The latter would see zero cache hits since one of the
inputs, the stage 1 GHC, has changed. Unfortunately, the latter is also
well over half of the build effort.

> As for rebuilds. People are already using --freeze1 (you suggested it
> earlier in this very thread!),
>
Yes, but they are doing so explicitly after having already built their
branch to produce a consistent stage1 compiler. If you checkout
`master`, build stage 1, switch to some arbitrary branch, and attempt to
build stage2 with --freeze1, chances are you will end up with a broken
compiler. In the best case this will manifest as a build failure.
However, there is a non-negigible possibility that the outcome is far
more sinister (e.g. segmentation faults).

> so I don't see how saying "freezing stage 1 is dangerous even if
> faster" connects to practise of GHC development. Of course, you may
> not find a remote cache with relevant artefacts after local updates,
> but that's not the point. The point is to not have to build `master`,
> not `feaure-branch-t12345`. Rebuilds should be rather pain-free in
> comparison.
>
For safe caching we must be certain that the build graph is accurate
and complete. However, we know with certainty that it currently is not and
fixing this requires real implementation effort (David Eichmann
spent a few months on this problem in 2019; see #16926 and related
tickets). Consequently, we must weigh the benefit of caching against the
development cost. Currently, my sense is that the benefit would be some
subset of the stage 1 build could be shared some of the time. This
strikes me as a rather small benefit compared to the cost.

Of course, we would love to have help in addressing #16926. In principle
having build caching would be nice; however, at the moment we just don't
believe that putting precious GHC team resources towards that goal is
the most effective way to serve users. If someone were to come along and
start chipping away at the #16926, we would be happy to advise and
assist.

Cheers,

 - Ben

[1] Strictly speaking, I don't believe this is quite true today since
    the absolute path of the working tree almost certainly leaks into
    the build artifacts. However, in principle this could be fixed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20220720/6693c803/attachment.sig>