Where do I start if I would like help improve GHC compilation times?
ben at well-typed.com
Mon Apr 10 22:15:03 UTC 2017
Niklas Hambüchen <mail at nh2.me> writes:
> I have some suggestions for low hanging fruits in this effort.
Thanks for your suggestions, they are quite reasonable.
> 1. Make ghc print more statistics on what it spending time on
> When I did the linking investigation recently
> I noticed (with strace) that there are lots of interesting syscalls
> being made that you might not expect. For example, each time TH is used,
> shared libraries are loaded, and to determine the shared library paths,
> ghc shells out to `gcc --print-file-name`. Each such invocation takes 20
> ms on my system, and I have 1000 invocations in my build. That's 20
> seconds (out of 2 minutes build time) just asking gcc for paths.
> I recommend that for every call to an external GHC measures how long
> that call took, so that it can be asked to print a summary when it's done.
Indeed. Since 8.0 GHC has had a scheme for tracking CPU time and
allocations of various compiler phases. It would be great to implement a
similar scheme for external tools. You can't necessarily reuse the
existing infrastructure since you will want to use monotonic time
instead of CPU time and allocations aren't relevant.
> That might give us lots of interesting things to optimize. For example,
> This would have made the long linker times totally obvious.
I have a number of thoughts on this which I'll share in another message
as they are largely orthogonal to the matter at hand.
> At the end, I would love to know for each compilation (both one-shot as
> used in ghc's build system, and `ghc --make`):
> * What programs did it invoke and how long did they take
> * What files did it read and how long did that take
> * How long did it take to read all the `.hi` files in `ghc --make`
> * High level time summary (parsing, typechecking, codegen, .hi files, etc)
We already have much of this. See D1959 for the relevant change. It
would be great to extend this with more (optional) detail (e.g. measure
individual interface file deserialization times).
> That way we'll know at least what is slow, and don't have to resort to
> strace every time in order to obtain this basic answer.
Indeed. On the whole this would be a very nice, more-or-less
> 2. Investigate if idiotic syscalls are being done and how much
> There's this concept I call "idiotic syscalls", which are syscalls of
> which you know from before that they won't contribute anything
> productive. For example, if you give a linker N many `-L` flags (library
> dirs) and M many `-l` flags (library names to link), it will try to
> `stat()` or `open()` N*M many files, out of which most are total
> rubbish, because we typically know what library is in what dir.
> Example: You pass `-L/usr/lib/opencv -L/usr/lib/imagemagick
> -L/usr/lib/blas -lopencv -limagemagick -lblas`. Then you you will get
> things like `open("/usr/lib/opencv/libimagemagick.so") = ENOENT` which
> makes no sense and obviously that file doesn't exist. This is a problem
> with the general "search path" concept; same happens for running
> executables searching through $PATH. Yes, nonexistent file opens fail
> fast, but in my typical ghc invocation I get millions of them (and we
> should at least measure how much time is wasted on them), and they
> clutter the strace output and make the real problems harder to investigate.
> We should check if we can create ways to give pass those files that do
Indeed, the matter of library search paths is actually a pretty bad one
although largely a Cabal issue (see GHC #11587).
> 3. Add pure TemplateHaskell
> It is well known that TH is a problem for incremental compilation
> because it can have side effects and we must therefore be more
> conservative about when to recompile; when you see a `[TH]` in your `ghc
> --make` output, it's likely that time again.
> I believe this could be avoided by adding a variant of TH that forbids
> the use of the `runIO` function, and can thus not have side effects.
> Most TH does not need side effects, for example any form of code
> generation based on other data types (lenses, instances for whatever).
> If that was made "pure TH", we would not have to recompile when inputs
> to our TH functions change.
> Potentially this could even be determined automatically instead of
> adding a new variant of TH like was done for typed TH `$$()`, simply by
> inspecting what's in the TH and if we can decide there's no `runIO` in
> there, mark it as clean, otherwise as tainted.
Cross-compiling users would also greatly appreciate this.
> 4. Build ghc with `ghc --make` if possible
> This one might be controversial or impossible (others can likely tell
> us). Most Haskell code is built with `ghc --make`, not with the one-shot
> compilation system + make or Hadrian as as done in GHC's build system.
> Weirdly, often `ghc --make` scales much worse and has much worse
> incremental recompilation times than the one-shot mode, which doesn't
> make sense given that it has no process creation overhead, can do much
> better caching etc. I believe that if ghc or large parts of it (e.g.
> stage2) itself was built with `--make`, we would magically see --make
> become very good, simply we make the right people (GHC devs) suffer
> through it daily :D. I expect from this the solution of the `-j`
> slowness, GHC overhead reduction, faster interface file loads and so on.
One thing I've wondered about is how we can make use of
compact regions to try to minimize GC costs in long-lived GHC sessions
(e.g. a --make invocation on a large project). This would likely help
parallel performance as well. N.B. #9221 is essentially the catch-all
ticket for -j scalability issues.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 487 bytes
Desc: not available
More information about the ghc-devs