parallelizing ghc

Evan Laforge qdunkan at gmail.com
Fri Feb 17 19:12:15 CET 2012


> Sure, except that if the server is to be used by multiple clients, you will
> get clashes in the PIT when say two clients both try to compile a module
> with the same name.
>
> The PIT is indexed by Module, which is basically the pair
> (package,modulename), and the package for the main program is always the
> same: "main".
>
> This will work fine if you spin up a new server for each program you want to
> build - maybe that's fine for your use case?

Yep, I have a new server for each CPU.  So compiling one program will
start up (say) 4 compilers and one server.  Then shake will start
throwing source files at the server, in the proper dependency order,
and the server will distribute the input files among the 4 servers.
Each server is single-threaded so I don't have to worry about calling
GHC functions reentrantly.

But --make is single-threaded as well, so why doesn't it just call
compileFile repeatedly and instead bother with all that HPT stuff?  Is
it just for ghci?

>> The 'user' is low for the server because it doesn't count time spent
>> by the subprocesses on the other end of the socket, but excluding
>> linking it looks like I can shave about 25% off compile time.
>> Unfortunately it winds up being just about the same as ghc --make, so
>> it seems too low.
>
> But that's what you expect, isn't it?

It's surprising to me that the serial --make is just about the same
speed as a parallelized one.  The whole point was to compile faster!

Granted, each interface has to be loaded for each processor while
--make only needs to do it once, but once loaded they should stay
loaded and I'd expect the benefit from two processors would win out
pretty quickly.

> --make has a slight advantage for linking in that it knows which packages it
> needs to link against, whereas plain ghc will link against all the packages
> on the command line.

Ohh, so maybe with --make it can omit some packages and do less work.
Let me try minimizing the -packages and see if that helps.

As an aside, it would be handy to be able to ask ghc "given this main
module, which -packages should the final program get?" but not
actually compile anything.  Is there a way to do that, short of
writing my own with the ghc api?  Would it be a reasonable ghc flag,
along the lines of -M but for packages?


BTW, in case anyone is interested, a darcs repo is at
http://ofb.net/~elaforge/ghc-server/



More information about the Glasgow-haskell-users mailing list