Where do I start if I would like help improve GHC compilation times?

Fri Apr 7 16:22:50 UTC 2017

Alfredo Di Napoli <alfredo.dinapoli at gmail.com> writes:

> Hey folks,
>
Hi Alfredo!

First, thanks for writing. More eyes looking at GHC's compiler
performance is badly needed.

> maybe I’m setting up for something too ambitious for me, but I would like
> to take an active stance to the overlasting “GHC compilation times are
> terrible” matter, instead of simply stare at the screen with despair
> whenever GHC compiles a sufficiently large Haskell program ;)
>
> To make this even more interesting, I have never contributed to GHC either!
> The max I have pushed myself into was 2 years ago when I successfully built
> GHC head from source and tried to fix an Haddock “easy” ticket I don’t even
> recall (full disclosure, eventually I didn’t :D ).
>
> Specifically, I would love community recommendations & guidance about:
>
> 1. Is this simply too daunting for somebody like me? Maybe is better to
> first start contributing more regularly, take confidence with the code base
> AND then move forward?
>
As with any software project, it is possible to treat the compiler as a
black box, throw a profiler at it and see what hotspots show up. This
gives you a place to focus your effort, allowing you to learn a small
area and broaden your knowledge as necessary.

However, I think it's fair to say that you will be significantly more
productive if you first develop a basic understanding of the compilation
pipeline. I'd recommend having a look at the GHC Commentary [1] for a
start.

I think it also helps to have a rough idea of what "slow" means to you.
I find it is quite helpful if you have a particular program which you
feel compiles more slowly than you would like (especially if it even
compiles slowly with -O0, since then much less of the compiler is
involved in compilation). Another approach is to look for programs whose
compilation time has regressed over the course of GHC releases. It is
not hard to find these examples and it is often possible to bisect your
way back to the regressing commit.

Also, note that I have collected some notes pertaining to compiler
performance on the Wiki [2]. Here you will find a number of tickets of
interest (as well a some rough themes which I've noticed), some nofib
results which might guide your efforts, as well as a list of some
fixes which have been committed in the past.

[1] https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler 
[2] https://ghc.haskell.org/trac/ghc/wiki/Performance/Compiler

> 2. Are compilation times largely dependant from the target platform (I’m on
> Darwin) or there is something which can be done “globally” so that the
> benefits can be experienced by everybody?
>
There are some external considerations (e.g. the platform's compiler and
linking toolchain) which contribute to GHC's runtime. For instance, it
is known that the BFD ld linker implementation that many Linux
distributions use by default is a great deal slower than it could be.
This particular issue has come up recently and I'm currently working on
allowing us to use the more performant gold linker when available.

However, I think it's fair to say that for most programs GHC's runtime
is largely independent of platform. I would invite you to try compiling
a package which you consider GHC to compile "slowly" with GHC's -v flag
(and GHC 8.0.1 or newer). This will give you a rough breakdown of where
time is spent. For many packages you will find that the simplifier
and/or typechecker dominate, followed (often distantly) by native code
generation. Of these steps native code generation is the only one with a
strong platform dependence.

> 3. Is there any recommended workflow to profile GHC compilation times? Is
> there any build flavour one should prefer when doing so? (Maybe the full,
> slowest one?)
>
There are a few options here:

 * As of GHC 8.0 the compiler will output timing and allocation
   information for its various stages if run with -v. This can be
   extremely helpful to get a high-level picture of where the compiler
   is spending its time while compiling your program. This is almost
   always the right place to start.

 * As with any Haskell program, the cost centre profiler can be used to
   characterize the memory and CPU behavior of various parts of the
   compiler.

   GHC's source tree includes a "prof" build flavour which builds the
   compiler with profiling enabled. However it only includes a handful
   of cost-centres and is best used when you already have a rough idea
   where you are looking and can add further cost-centres to drill down
   to your hotspot.

   Simply enabling -fprof-exported across the entire tree just doesn't
   work in my experience: not only is the resulting compiler quite slow,
   but the profile you get is far too unwieldy to learn from.

 * Occassionally the ticky-ticky profiler can be helpful in identifying
   allocation hotspots without the full overhead of the cost-centre
   profiler.

 * In principle our newly-stable DWARF debug information can be used for
   profiling, although this is still a work in progress and requires a
   patched GHC for best results. It's probably best to stick to the more
   traditional profiling mechanisms for now.

Anyways, I hope this helps. Always feel free to get in touch with me
personally (IRC and email are both great) if you would like to discuss
particular issues. Thanks again for your interest!

Cheers,

- Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170407/ed723c99/attachment.sig>