How to better parallelize GHC build.

Wed Apr 1 10:34:12 UTC 2015

Hi Karel,

could you try adding `-j8` to `SRC_HC_OPTS` for the build flavor you're
using in `mk/build.mk`, and running `gmake -j8` instead of `gmake -j64`. A
graph like the one you attached will likely look even worse, but the
walltime of your build should hopefully be improved.

The build system seems to currently rely entirely on `make` for
parallelism. It doesn't exploit ghc's own parallel `--make` at all, unless
you explictly add `-jn` to SRC_HC_OPTS, with n>1 (which also sets the
number of capabilities for the runtime system, so also adding `+RTS -Nn` is
not needed).

Case study: One of the first things the build system does is build
ghc-cabal and Cabal using the stage 0 compiler, through a single invocation
of `ghc --make`. All the later make targets depend on that step to complete
first. Because `ghc --make` is not instructed to build in parallel, using
`make -j1` or `make -j100000` doesn't make any difference (for that step).
I think your graph shows that there are many of more of such bottlenecks.

You would have to find out empirically how to best divide your number of
threads (32) between `make` and `ghc --make`. From reading this comment
<https://ghc.haskell.org/trac/ghc/ticket/9221#comment:12> by Simon in #9221
I understand it's better not to call `ghc --make -jn` with `n` higher than
the number of physical cores of your machine (8 in your case). Once you get
some better parallelism, other flags like `-A` might also have an effect on
walltime (see that ticket).

-Thomas

On Sat, Mar 7, 2015 at 11:49 AM, Karel Gardas <karel.gardas at centrum.cz>
wrote:

>
> Folks,
>
> first of all, I remember someone already mentioned issue with decreased
> parallelism of the GHC build recently somewhere but I cann't find it now.
> Sorry, for that since otherwise I would use this thread if it was on this
> mailing list.
>
> Anyway, while working on SPARC NCG I'm using T2000 which provides 32
> threads/8 core UltraSPARC T1 CPU. The property of this machine is that it's
> really slow on single-threaded work. To squeeze some perf from it man
> really needs to push 32 threads of work on it. Now, it really hurts my
> nerves to see it's lazy building/running just one or two ghc processes. To
> verify the fact I've created simple script to collect number of ghc
> processes over time and putting this to graph. The result is in the
> attached picture. The graph is result of running:
>
> gmake -j64
>
> anyway, the average number of running ghc processes is 4.4 and the median
> value is 2. IMHO such low number not only hurts build times on something
> like CMT SPARC machine, but also on let say a cluster of ARM machines using
> NFS and also on common engineering workstations which provide these days
> (IMHO!) around 8-16 cores (and double the threads number).
>
> My naive idea(s) for fixing this issue is (I'm assuming no Haskell file
> imports unused imports here, but perhaps this may be also investigated):
>
> 1) provide explicit dependencies which guides make to build in more
> optimal way
>
> 2) hack GHC's make depend to kind of compute explicit dependencies from
> (1) in an optimal way automatically
>
> 3) someone already mentioned using shake for building ghc. I don't know
> shake but perhaps this is the right direction?
>
> 4) hack GHC to compile needed hi file directly in its memory if hi file is
> not (yet!) available (issue how to get compiling options right here). Also
> I don't know hi file semantics yet so bear with me on this.
>
>
> Is there anything else which may be done to fix that issue? Is someone
> already working on some of those? (I mean those reasonable from the list)?
>
> Thanks!
> Karel
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20150401/e77d1e16/attachment-0001.html>