Where do I start if I would like help improve GHC compilation times?
ben at smart-cactus.org
Mon Apr 10 22:47:25 UTC 2017
Alfredo Di Napoli <alfredo.dinapoli at gmail.com> writes:
> Hey Ben,
Sorry for the late response! The email queue from the weekend was a bit
longer than I would like.
> as promised I’m back to you with something more articulated and hopefully
> meaningful. I do hear you perfectly — probably trying to dive head-first
> into this without at least a rough understanding of the performance
> hotspots or the GHC overall architecture is going to do me more harm than
> good (I get the overall picture and I’m aware of the different stages of
> the GHC compilation pipeline, but it’s far from saying I’m proficient with
> the architecture as whole). I have also read a couple of years ago the GHC
> chapter on the “Architeture of Open Source Applications” book, but I don’t
> know how much that is still relevant. If it is, I guess I should refresh my
It sounds like you have done a good amount of reading. That's great.
Perhaps skimming the AOSA chapter again wouldn't hurt, but otherwise
it's likely worthwhile diving in.
> I’m currently trying to move on 2 fronts — please advice if I’m a fool
> flogging a dead horse or if I have any hope of getting anything done ;)
> 1. I’m trying to treat indeed the compiler as a black block (as you
> adviced) trying to build a sufficiently large program where GHC is not “as
> fast as I would like” (I know that’s a very lame definition of “slow”,
> hehe). In particular, I have built the stage2 compiler with the “prof”
> flavour as you suggested, and I have chosen 2 examples as a reference
> “benchmark” for performance; DynFlags.hs (which seems to have been
> mentioned multiple times as a GHC perf killer) and the highlighting-kate
> package as posted here: https://ghc.haskell.org/trac/ghc/ticket/9221 .
Indeed, #9221 would be a very interesting ticket to look at. The
highlighting-kate package is interesting in the context of that ticket
as it has a very large amount of parallelism available.
If you do want to look at #9221, note that the cost centre profiler may
not provide the whole story. In particular, it has been speculated that
the scaling issues may be due to either,
* threads hitting a blackhole, resulting in blocking
* the usual scaling limitations of GHC's stop-the-world GC
The eventlog may be quite useful for characterising these.
> The idea would be to compile those with -v +RTS -p -hc -RTS enabled,
> look at the output from the .prof file AND the `-v` flag, find any
> hotspot, try to change something, recompile, observe diff, rinse and
> repeat. Do you think I have any hope of making progress this way? In
> particular, I think compiling DynFlags.hs is a bit of a dead-end; I
> whipped up this buggy script which
> escalated into a Behemoth which is compiling pretty much half of the
> compiler once again :D
> #!/usr/bin/env bash
> ../ghc/inplace/bin/ghc-stage2 --make -j8 -v +RTS -A256M -qb0 -p -h \
> -RTS -DSTAGE=2 -I../ghc/includes -I../ghc/compiler -I../ghc/compiler/stage2
> -I../ghc/compiler/stage2/build \
> -fforce-recomp -c $@
> I’m running it with `./dynflags.sh ../ghc/compiler/main/DynFlags.hs` but
> it’s taking a lot to compile (20+ mins on my 2014 mac Pro) because it’s
> pulling in half of the compiler anyway :D I tried to reuse the .hi files
> from my stage2 compilation but I failed (GHC was complaining about
> interface file mismatch). Short story short, I don’t think it will be a
> very agile way to proceed. Am I right? Do you have any recommendation in
> such sense? Do I have any hope to compile DynFlags.hs in a way which would
> make this perf investigation feasible?
What I usually do in this case is just take the relevant `ghc` command
line directly from the `make` output and execute it manually. I would
imagine your debug cycle would look something like,
* instrument the compiler
* build stage1
* use stage2 to build DynFlags using the stage1 compiler (using a saved command line)
This should only take a few minutes per iteration.
> The second example (the highlighting-kate package) seems much more
> promising. It takes maybe 1-2 mins on my machine, which is enough to take a
> look at the perf output. Do you think I should follow this second lead? In
> principle any 50+ modules package I think would do (better if with a lot of
> TH ;) ) but this seems like a low-entry barrier start.
> 2. The second path I’m exploring is simply to take a less holistic approach
> and try to dive in into a performance ticket like the ones listed here:
> Maybe some are very specific, but it seems like fixing small things and
> move forward could help giving me understanding of different sub-parts of
> GHC, which seems less intimidating than the black-box approach.
Do you have any specific tickets from these lists that you found
> In conclusion, what do you think is the best approach, 1 or 2, both or
> none? ;)
I would say that it largely depends upon what you feel most comfortable
with. If you feel up for it, I think #9221 would be a nice, fairly
self-contained, yet high-impact ticket which would be worth spending a
few days diving further into.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 487 bytes
Desc: not available
More information about the ghc-devs