parallelizing ghc

Mon Feb 13 10:13:21 CET 2012

On 10/02/2012 08:01, Evan Laforge wrote:
>> I like the idea!  And it should be possible to build this without modifying
>> GHC at all, on top of the GHC API.  As you say, you'll need a server
>> process, which accepts command lines, executes them, and sends back the
>> results.  A local socket should be fine (and will work on both Unix and
>> Windows).
>
> I took a whack at this, but I'm having to backtrack a bit now because
> I don't fully understand the GHC API, so I thought I should explain my
> understanding to make sure I'm on the right track.
>
> It appears the cached information I want to preserve between compiles
> is in HscEnv.  At first I thought I could just do what --make does,
> but what it does is call 'GHC.load', which maintains the HscEnv (which
> mostly means loading already compiled modules into the
> HomePackageTable, since the other cache entries are apparently loaded
> on demand by DriverPipeline.compileFile).  But actually it does a lot
> of things, such as detecting that a module doesn't need recompilation
> and directly loading the interface in that case.  So I thought it
> would be quickest to just use it: add a new target to the set of
> targets and call load again.
>
> However, there are problems with that.  The first is it doesn't pay
> attention to DynFlags.outputFile, which makes sense because it's
> expecting to compile multiple files.  The bigger problem is that it
> apparently wants to reload the whole set each time, so it winds up
> being slower rather than faster.  I guess 'load' is really set up to
> figure out dependencies on its own and compile a set of modules, so
> I'm talking at the wrong level.
>
> So I think I need to rewrite the HPT-maintaining parts of GHC.load and
> write my own compileFile that *does* maintain the HPT.  And also
> figure out what other parts of the HscEnv should be updated, if any.
> Sound about right?

What you're trying to do is mimic the operation of 'ghc -c Foo.hs ..' 
but cache any loaded interface files and re-use them.  This means you 
need to retain the contents of HscEnv (as you say), because that 
contains the cached information.

However, the GHC API doesn't provide a way to do this directly (I hadn't 
really thought about this when I suggested the idea before, sorry).  The 
GHC API provides support for compiling multiple modules in the way that 
GHCi and --make work; each module is added to the HPT as it is compiled. 
  But when compiling single modules, GHC doesn't normally use the HPT - 
interfaces for modules in the home package are normally demand-loaded in 
the same way as interfaces for package modules, and added to the PIT. 
The crucial difference between the HPT and the PIT is that the PIT 
supports demand-loading of interfaces, but the HPT is supposed to be 
populated in the right order by the compilation manager - home package 
modules are assumed to be present in the HPT when they are required.

For 'ghc -c Foo.hs' you want to demand-load interfaces for other modules 
in the same package (and cache them), but you want them to not get mixed 
up with interfaces from other packages that may be being compiled 
simultaneously by other clients.  There's no easy way to solve this. 
You could avoid the problem by not caching home-package interfaces, but 
that may throw away a lot of the benefit of doing this.  Or you could 
maintain some kind of session state with the client over multiple 
compilations, and only discard the home package interfaces if another 
client connects.

There are further complications in that certain flags can invalidate the 
information you have cached: changing the package flags, for instance.

So I think some additions to the API are almost certainly needed.  But 
this is as far as I have got in thinking about the problem...

Cheers,
	Simon

>
> Along the way I ran into the problem that it's impossible to re-parse
> GHC flags to compare them to previous runs, because static flags only
> export a parsing function that mutates global variables and can only
> be called once.  So I parse out the dynamic flags, strip out the *.hs
> args, and assume the rest are static flags.  I noticed comments about
> converting them all to dynamic, I guess that might make a nice
> housekeeping project some day.