parallelizing ghc

Fri Feb 17 02:59:31 CET 2012

> However, the GHC API doesn't provide a way to do this directly (I hadn't
> really thought about this when I suggested the idea before, sorry).  The GHC
> API provides support for compiling multiple modules in the way that GHCi and
> --make work; each module is added to the HPT as it is compiled.  But when
> compiling single modules, GHC doesn't normally use the HPT - interfaces for
> modules in the home package are normally demand-loaded in the same way as
> interfaces for package modules, and added to the PIT. The crucial difference
> between the HPT and the PIT is that the PIT supports demand-loading of
> interfaces, but the HPT is supposed to be populated in the right order by
> the compilation manager - home package modules are assumed to be present in
> the HPT when they are required.

Yah, that's what I don't understand about HscEnv.  The HPT doc says
that in one-shot mode, the HPT is empty and even local modules are
demand-cached in the ExternalPackageState (which the PIT belongs to).
And the EPS doc itself reinforces that where it says in one-shot mode
"home-package modules accumulate in the external package state".

So why not just ignore the HPT, and run multiple "one-shot" compiles,
and let all the info accumulate in the PIT?

A fair amount of work in GhcMake is concerned with trimming old data
out of the HPT, I assume this is for ghci that wants to reload changed
modules but keep unchanged ones.  I don't actually care about that
since I can assume the modules will be unchanged over one run.

So I tried just calling compileFile multiple times in the same
GhcMonad, assuming the mutable bits of the HscEnv get updated
appropriately.  Here are the results for a build of about 200 modules:

with persistent server:
no link:
3.30s user 1.60s system 12% cpu 38.323 total
3.50s user 1.66s system 13% cpu 38.368 total
link:
21.66s user 4.13s system 35% cpu 1:11.62 total
21.59s user 4.54s system 38% cpu 1:08.13 total
21.82s user 4.70s system 35% cpu 1:14.56 total

without server (ghc -c):
no link:
109.25s user 19.90s system 240% cpu 53.750 total
109.11s user 19.23s system 243% cpu 52.794 total
link:
128.10s user 21.66s system 201% cpu 1:14.29 total

ghc --make (with linking since I can't turn that off):
42.57s user 5.83s system 74% cpu 1:05.15 total

The 'user' is low for the server because it doesn't count time spent
by the subprocesses on the other end of the socket, but excluding
linking it looks like I can shave about 25% off compile time.
Unfortunately it winds up being just about the same as ghc --make, so
it seems too low.  Perhaps I should be using the HPT?  I'm also
falling back to plain ghc for linking, maybe --make can link faster
when it has everything cached?  I guess it shouldn't, because it
presumably just dispatches to ld.