[GHC] #14095: Improve parallelism in --make mode
GHC
ghc-devs at haskell.org
Mon Aug 7 02:53:21 UTC 2017
#14095: Improve parallelism in --make mode
-------------------------------------+-------------------------------------
Reporter: duog | Owner: (none)
Type: task | Status: new
Priority: normal | Milestone:
Component: Compiler | Version:
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
Currently, GhcMake.parUpsweep compiles modules in parallel, with each
module waiting for all of it's dependencies to complete before proceeding.
Parallelism is constrained by the -j flag, which limits the number of
modules that may be compiled simultaneously. When multiple modules could
be compiled next, ties are broken by a race to claim a semaphore.
This ticket is about improving this parallelism, both by making it more
finely-grained, and by more intelligently scheduling the next module to
compile. This allows, for example, for code generation of modules compiled
early to be delayed, so that other modules can be typechecked and
simplified, unblocking their dependencies. I have a patch implementing the
design described below, and I hope to have it on phab this week. It will
be very much still a work in progress, however, some initial benchmarking
suggests some quite good results, so I would very much appreciate feedback
and help at this stage.
The unit of operation of parUpsweep is a MakeTask, defined with some
companion types as:
{{{
data BuildModule = BuildModule ModuleName IsBoot
data MakePipeline = MP_Parse | MP_Typecheck | MP_Simplify | MP_CodeGen
data MakeTask = MakeTask BuildModule MakePipeline
}}}
As a module compiles, it starts parsing, then at the completion of
hscParse', it moves to typechecking. Then, after desugaring it moves to
simplifying, then after hscSimplify' moves to codegen.
Note that a module can be parsed independently of it's dependencies. It
can, in principle (I claim!), be typechecked once it's dependencies have
been typechecked, and be simplified once it's dependencies have been
simplified. It can be codegened independently of it's dependencies.
In HscEnv, we store four IO actions in the following data type,
{{{
data HscYield = HscYield
{ yieldParsedSource :: HscEnv -> ModSummary -> HsParsedModule -> IO ()
, yieldTypecheckedInterface :: HscEnv -> ModSummary -> ModIface -> Maybe
ModDetails -> IO HomeModInfo
, yieldSimplifiedInterface :: HscEnv -> ModSummary -> ModIface -> Maybe
ModDetails -> IO HomeModInfo
, awaitDependencyInterfaces :: HscEnv -> ModSummary -> IO ()
}
}}}
The first three members are called when a MakePipeline stage is complete.
The fourth must be called to ensure all a module's dependent modules are
present in the home package table. This is necessary at least during some
codepaths in recompilation avoidance and safe haskell checking. All these
IO actions will block, and return only once the next MakePipeline stage
for that module has been scheduled to run. Calls to these functions are
sprinkled through HscMain in strategic locations.
We also change HscEnv.hsc_HPT to be an IORef HomePackageTable. I do expect
this to be controversial, and even potentially unacceptable, and there are
potentially alternate implementations on which I will not elaborate here.
Note that responsibility for adding an interface to the home package table
is on the implementer of yieldTypecheckedInterface and
yieldSimplifiedInterface.
In Ghc.parUpsweep we pre-compute a graph with nodes of MakeTasks and edges
dependencies on other MakeTasks. We maintain a priority queue of MakeTasks
whose dependencies are satisfied, where the priority is loosely an
estimate of the work unblocked by completing that task. Whenever a
MakeTask completes (and parUpsweep is notified via a call to a yield...
function described above), the next MakeTask is dequeued from the priority
queue and allowed to continue.
There is some complication in the transition from typechecking to
simplifying, and in ensuring the correct information is present in the
home package table at the right time. Perhaps it is better to treat this
as a single pipeline stage, and indeed, perhaps parsing as a separate
pipeline stage does not pull its weight. I believe the split between
simplify and codegen is where the majority of the benefit lies.
Nevertheless, as is, it is working, so I must at least offer the full idea
for consideration.
I do understand that this will be difficult to understand or comment on
without the code, and I'll endeavour to get it on phab as soon as I can,
along with some benchmarks to motivate this change.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14095>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list