Where do I start if I would like help improve GHC compilation times?

Sun Apr 9 20:42:49 UTC 2017

Building modules from GHC itself is a little tricky and DynFlags is
extra tricky since it is involved in import cycles. Here is what I do:

* Copy DynFlags.hs somewhere outside the tree (for your present
purposes, it is no longer part of the compiler, but just some module
to be provided as input).
* Get rid of all the {-# SOURCE #-} pragmas on imports to turn them
into ordinary, non-boot file imports.
* Build with ".../ghc/inplace/bin/ghc-stage2 DynFlags -package ghc
-I.../ghc/compiler/stage2" plus whatever other options you want (e.g.,
probably "-fforce-recomp -O +RTS -s" at a minimum). By using "-package
ghc" you compile DynFlags against the version of ghc that you have
just built.
* This will result in some type errors, because DynFlags imports some
functions that expect arguments of type DynFlags. (This relates to the
import cycles that we broke earlier.) Since you are building against
the version of those functions from the ghc package, they expect the
type ghc:DynFlags.DynFlags, but they are now receiving a value of type
DynFlags from the main package. This is no big deal, just insert an
unsafeCoerce wherever necessary (mostly in front of occurrences of
"dflags") to get the compiler to stop complaining.

This is not 100% faithful to the way DynFlags would actually be
compiled during a GHC build, but the advantage of this method is that
you don't have to worry about GHC doing any recompilation checking
between the copy of DynFlags that you are testing on and the
compiler's own modules.

Regards,
Reid Barton

On Sun, Apr 9, 2017 at 5:37 AM, Alfredo Di Napoli
<alfredo.dinapoli at gmail.com> wrote:
> Hey Ben,
>
> as promised I’m back to you with something more articulated and hopefully
> meaningful. I do hear you perfectly — probably trying to dive head-first
> into this without at least a rough understanding of the performance hotspots
> or the GHC overall architecture is going to do me more harm than good (I get
> the overall picture and I’m aware of the different stages of the GHC
> compilation pipeline, but it’s far from saying I’m proficient with the
> architecture as whole). I have also read a couple of years ago the GHC
> chapter on the “Architeture of Open Source Applications” book, but I don’t
> know how much that is still relevant. If it is, I guess I should refresh my
> memory.
>
> I’m currently trying to move on 2 fronts — please advice if I’m a fool
> flogging a dead horse or if I have any hope of getting anything done ;)
>
> 1. I’m trying to treat indeed the compiler as a black block (as you adviced)
> trying to build a sufficiently large program where GHC is not “as fast as I
> would like” (I know that’s a very lame definition of “slow”, hehe). In
> particular, I have built the stage2 compiler with the “prof” flavour as you
> suggested, and I have chosen 2 examples as a reference “benchmark” for
> performance; DynFlags.hs (which seems to have been mentioned multiple times
> as a GHC perf killer) and the highlighting-kate package as posted here:
> https://ghc.haskell.org/trac/ghc/ticket/9221 . The idea would be to compile
> those with -v +RTS -p -hc -RTS enabled, look at the output from the .prof
> file AND the `-v` flag, find any hotspot, try to change something,
> recompile, observe diff, rinse and repeat. Do you think I have any hope of
> making progress this way? In particular, I think compiling DynFlags.hs is a
> bit of a dead-end; I whipped up this buggy script which escalated into a
> Behemoth which is compiling pretty much half of the compiler once again :D
>
> ```
> #!/usr/bin/env bash
>
> ../ghc/inplace/bin/ghc-stage2 --make -j8 -v +RTS -A256M -qb0 -p -h \
> -RTS -DSTAGE=2 -I../ghc/includes -I../ghc/compiler -I../ghc/compiler/stage2
> \
> -I../ghc/compiler/stage2/build \
> -i../ghc/compiler/utils:../ghc/compiler/types:../ghc/compiler/typecheck:../ghc/compiler/basicTypes
> \
> -i../ghc/compiler/main:../ghc/compiler/profiling:../ghc/compiler/coreSyn:../ghc/compiler/iface:../ghc/compiler/prelude
> \
> -i../ghc/compiler/stage2/build:../ghc/compiler/simplStg:../ghc/compiler/cmm:../ghc/compiler/parser:../ghc/compiler/hsSyn
> \
> -i../ghc/compiler/ghci:../ghc/compiler/deSugar:../ghc/compiler/simplCore:../ghc/compile/specialise
> \
> -fforce-recomp -c $@
> ```
>
> I’m running it with `./dynflags.sh ../ghc/compiler/main/DynFlags.hs` but
> it’s taking a lot to compile (20+ mins on my 2014 mac Pro) because it’s
> pulling in half of the compiler anyway :D I tried to reuse the .hi files
> from my stage2 compilation but I failed (GHC was complaining about interface
> file mismatch). Short story short, I don’t think it will be a very agile way
> to proceed. Am I right? Do you have any recommendation in such sense? Do I
> have any hope to compile DynFlags.hs in a way which would make this perf
> investigation feasible?
>
> The second example (the highlighting-kate package) seems much more
> promising. It takes maybe 1-2 mins on my machine, which is enough to take a
> look at the perf output. Do you think I should follow this second lead? In
> principle any 50+ modules package I think would do (better if with a lot of
> TH ;) ) but this seems like a low-entry barrier start.
>
> 2. The second path I’m exploring is simply to take a less holistic approach
> and try to dive in into a performance ticket like the ones listed here:
> https://www.reddit.com/r/haskell/comments/45q90s/is_anything_being_done_to_remedy_the_soul/czzq6an/
> Maybe some are very specific, but it seems like fixing small things and move
> forward could help giving me understanding of different sub-parts of GHC,
> which seems less intimidating than the black-box approach.
>
> In conclusion, what do you think is the best approach, 1 or 2, both or none?
> ;)
>
> Thank you!
>
> Alfredo
>
> On 7 April 2017 at 18:30, Alfredo Di Napoli <alfredo.dinapoli at gmail.com>
> wrote:
>>
>> Hey Ben,
>>
>> thanks for the quite exhaustive reply! I’m on the go right now, but I
>> promise to get back to you with a meaningful reply later this weekend ;)
>>
>> Alfredo
>>
>> On 7 April 2017 at 18:22, Ben Gamari <ben at smart-cactus.org> wrote:
>>>
>>> Alfredo Di Napoli <alfredo.dinapoli at gmail.com> writes:
>>>
>>> > Hey folks,
>>> >
>>> Hi Alfredo!
>>>
>>> First, thanks for writing. More eyes looking at GHC's compiler
>>> performance is badly needed.
>>>
>>> > maybe I’m setting up for something too ambitious for me, but I would
>>> > like
>>> > to take an active stance to the overlasting “GHC compilation times are
>>> > terrible” matter, instead of simply stare at the screen with despair
>>> > whenever GHC compiles a sufficiently large Haskell program ;)
>>> >
>>> > To make this even more interesting, I have never contributed to GHC
>>> > either!
>>> > The max I have pushed myself into was 2 years ago when I successfully
>>> > built
>>> > GHC head from source and tried to fix an Haddock “easy” ticket I don’t
>>> > even
>>> > recall (full disclosure, eventually I didn’t :D ).
>>> >
>>> > Specifically, I would love community recommendations & guidance about:
>>> >
>>> > 1. Is this simply too daunting for somebody like me? Maybe is better to
>>> > first start contributing more regularly, take confidence with the code
>>> > base
>>> > AND then move forward?
>>> >
>>> As with any software project, it is possible to treat the compiler as a
>>> black box, throw a profiler at it and see what hotspots show up. This
>>> gives you a place to focus your effort, allowing you to learn a small
>>> area and broaden your knowledge as necessary.
>>>
>>> However, I think it's fair to say that you will be significantly more
>>> productive if you first develop a basic understanding of the compilation
>>> pipeline. I'd recommend having a look at the GHC Commentary [1] for a
>>> start.
>>>
>>> I think it also helps to have a rough idea of what "slow" means to you.
>>> I find it is quite helpful if you have a particular program which you
>>> feel compiles more slowly than you would like (especially if it even
>>> compiles slowly with -O0, since then much less of the compiler is
>>> involved in compilation). Another approach is to look for programs whose
>>> compilation time has regressed over the course of GHC releases. It is
>>> not hard to find these examples and it is often possible to bisect your
>>> way back to the regressing commit.
>>>
>>> Also, note that I have collected some notes pertaining to compiler
>>> performance on the Wiki [2]. Here you will find a number of tickets of
>>> interest (as well a some rough themes which I've noticed), some nofib
>>> results which might guide your efforts, as well as a list of some
>>> fixes which have been committed in the past.
>>>
>>> [1] https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler
>>> [2] https://ghc.haskell.org/trac/ghc/wiki/Performance/Compiler
>>>
>>> > 2. Are compilation times largely dependant from the target platform
>>> > (I’m on
>>> > Darwin) or there is something which can be done “globally” so that the
>>> > benefits can be experienced by everybody?
>>> >
>>> There are some external considerations (e.g. the platform's compiler and
>>> linking toolchain) which contribute to GHC's runtime. For instance, it
>>> is known that the BFD ld linker implementation that many Linux
>>> distributions use by default is a great deal slower than it could be.
>>> This particular issue has come up recently and I'm currently working on
>>> allowing us to use the more performant gold linker when available.
>>>
>>> However, I think it's fair to say that for most programs GHC's runtime
>>> is largely independent of platform. I would invite you to try compiling
>>> a package which you consider GHC to compile "slowly" with GHC's -v flag
>>> (and GHC 8.0.1 or newer). This will give you a rough breakdown of where
>>> time is spent. For many packages you will find that the simplifier
>>> and/or typechecker dominate, followed (often distantly) by native code
>>> generation. Of these steps native code generation is the only one with a
>>> strong platform dependence.
>>>
>>> > 3. Is there any recommended workflow to profile GHC compilation times?
>>> > Is
>>> > there any build flavour one should prefer when doing so? (Maybe the
>>> > full,
>>> > slowest one?)
>>> >
>>> There are a few options here:
>>>
>>>  * As of GHC 8.0 the compiler will output timing and allocation
>>>    information for its various stages if run with -v. This can be
>>>    extremely helpful to get a high-level picture of where the compiler
>>>    is spending its time while compiling your program. This is almost
>>>    always the right place to start.
>>>
>>>  * As with any Haskell program, the cost centre profiler can be used to
>>>    characterize the memory and CPU behavior of various parts of the
>>>    compiler.
>>>
>>>    GHC's source tree includes a "prof" build flavour which builds the
>>>    compiler with profiling enabled. However it only includes a handful
>>>    of cost-centres and is best used when you already have a rough idea
>>>    where you are looking and can add further cost-centres to drill down
>>>    to your hotspot.
>>>
>>>    Simply enabling -fprof-exported across the entire tree just doesn't
>>>    work in my experience: not only is the resulting compiler quite slow,
>>>    but the profile you get is far too unwieldy to learn from.
>>>
>>>  * Occassionally the ticky-ticky profiler can be helpful in identifying
>>>    allocation hotspots without the full overhead of the cost-centre
>>>    profiler.
>>>
>>>  * In principle our newly-stable DWARF debug information can be used for
>>>    profiling, although this is still a work in progress and requires a
>>>    patched GHC for best results. It's probably best to stick to the more
>>>    traditional profiling mechanisms for now.
>>>
>>> Anyways, I hope this helps. Always feel free to get in touch with me
>>> personally (IRC and email are both great) if you would like to discuss
>>> particular issues. Thanks again for your interest!
>>>
>>> Cheers,
>>>
>>> - Ben
>>>
>>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>