GHC development asks too much of the host system

Tue Jul 19 19:10:35 UTC 2022

Hécate <hecate at glitchbra.in> writes:

> Hello ghc-devs,
>
> I hadn't made significant contributions to the GHC code base in a while, 
> until a few days ago, where I discovered that my computer wasn't able to 
> sustain running the test suite, nor handle HLS well.
>
> Whether it is my OS automatically killing the process due to oom-killer 
> or just the fact that I don't have a war machine, I find it too bad and 
> I'm frankly discouraged.

Do you know which process was being killed? There is one testsuite tests
that I know of which does have quite a considerable memory footprint
(T16992) due to its nature; otherwise I would expect a reasonably recent
machine to pass the testsuite without much trouble. It's particularly
concerning if this is a new regression; is this the first time you have
observed this particular failure?

> This is not the first time such feedback emerges, as the documentation 
> task force for the base library was unable to properly onboard some 
> people from third-world countries who do not have access to hardware 
> we'd consider "standard" in western Europe or some parts of North 
> America. Or at least "standard" until even my standard stuff didn't cut 
> it anymore.
>
> So yeah, I'll stay around but I'm afraid I'm going to have to focus on 
> projects for which the feedback loop is not on the scale of hours , as 
> this is a hobby project.
>
> Hope this will open some eyes.
>
Hi Hécate,

I would reiterate that the more specific feedback you can offer, the
better.

To share my some of my own experience: I have access to a variety of hardware,
some of which is quite powerful. However, I find that I end up doing
much of my development on my laptop which, while certainly not a slouch
(being a Ryzen 4750U), is also not a monster. In particular, while a
fresh build takes nearly twice as long on my laptop than some of the
other hardware I have, I nevertheless find ways to make it worthwhile
(due to the ease of iteration compared to ssh). If you routinely have
multi-hour iteration times then something isn't right.

In particular, I think there are a few tricks which make life far
easier:

 * Be careful about doing things that would incur
   significant amounts of rebuilding. This includes:

    * After modifying, e.g., `compiler/ghc.cabal.in` (e.g. to add a new
      module to GHC), modify `compiler/ghc.cabal` manually instead of
      rerunning `configure`.

    * Be careful about pulling/rebase. I generally pick a base commit to
      build off of and rebase sparingly: Having to stop what I'm doing to
      wait for full rebuild is an easy way to lose momentum.

    * Avoid switching branches; I generally have a GHC tree per on-going
      project.

 * Take advantage of Hadrian's `--freeze1` flag

 * Use `hadrian/ghci` to typecheck changes

 * Use the stage1 compiler instead of stage2 to smoke-test changes when
   possible. (specifically, using the script generated by Hadrian's
   `_build/ghc-stage1` target)

 * Use the right build flavour for the task at hand: If I don't need a
   performant compiler and am confident that I can get by without
   thorough testsuite validation, I use `quick`. Otherwise, plan ahead
   for what you need (e.g. `default+assertions+debug_info` or
   `validate`)

 * Run the fraction of the testsuite that is relevant to your change.
   Hadrian's `--test-way` and `--only` flags are your friends.

 * Take advantage of CI. At the moment we have a fair amount of CI
   capacity. If you think that your change is close to working, you can
   open an MR and start a build locally. If it fails, iterate on just the
   failing testcases locally.

 * Task-level parallelism. Admittedly, this is harder when you are
   working as a hobby, but I often have two or three projects on-going
   at a time. While one tree is building I try to make progress on
   another.

I don't use HLS so I may be insulated from some of the pain in this
regard. However, I do know that Matt is a regular user and he
disables most plugins.

I would also say that, sadly, GHC is comparable to other similarly-size
compilers in its build time: A build of LLVM (not even clang) takes ~50
minutes on my 8-core desktop; impressively, rustc takes ~7 minutes
although it is a considerably smaller compiler (being just a front-end).
By contrast, GHC takes around 20 minutes. I know that this doesn't
make the cost any easier to bear and I would love to bring this number
down, but ultimately there are only so many hours in the day.

I think one underexplored approach to addressing the build-time problem
is to look not at the full-build time but rather look for common tasks
where we could *avoid* doing a full build (e.g. updating documentation,
typechecking `base`, running a "good enough" subset of the testsuite)
and find ways to make those workflows more efficient.

Cheers,

- Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20220719/29bf4af9/attachment.sig>