parallelizing ghc

Thu Jan 26 10:16:42 CET 2012

On 24/01/2012 03:53, Evan Laforge wrote:
 > I recently switched from ghc --make to a parallelized build system.  I
 > was looking forward to faster builds, and while they are much faster
 > at figuring out what has to be rebuilt (which is most of the time for
 > a small rebuild, since ld dominates), compilation of the whole system
 > is either the same or slightly slower than the single threaded ghc
 > --make version.  My guess is that the overhead of starting up lots of
 > individual ghcs, each of which has to read all the .hi files all over
 > again, just about cancels out the parallelism gains.

I'm slightly surprised by this - in my experience parallel builds beat 
--make as long as the parallelism is a factor of 2 or more.  Is your 
dependency graph very narrow, or do you have lots of very small modules?

> So I'm wondering, does this seem reasonable and feasible?  Is there a
> better way to do it?  Even if it could be done, would it be worth it?
> If the answers are "yes", "maybe not", and "maybe yes", then how hard
> would this be to do and where should I start looking?  I'm assuming
> start at GhcMake.hs and work outwards from there...

I like the idea!  And it should be possible to build this without 
modifying GHC at all, on top of the GHC API.  As you say, you'll need a 
server process, which accepts command lines, executes them, and sends 
back the results.  A local socket should be fine (and will work on both 
Unix and Windows).

The server process can either do the compilation itself, or have several 
workers.  Unfortunately the workers would have to be separate processes, 
because the GHC API is single threaded.

When a worker gets too large, just kill it and start a new one.

Cheers,
	Simon