[GHC] #14095: Improve parallelism in --make mode

GHC ghc-devs at haskell.org
Mon Aug 7 02:53:21 UTC 2017


#14095: Improve parallelism in --make mode
-------------------------------------+-------------------------------------
           Reporter:  duog           |             Owner:  (none)
               Type:  task           |            Status:  new
           Priority:  normal         |         Milestone:
          Component:  Compiler       |           Version:
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 Currently, GhcMake.parUpsweep compiles modules in parallel, with each
 module waiting for all of it's dependencies to complete before proceeding.
 Parallelism is constrained by the -j flag, which limits the number of
 modules that may be compiled simultaneously. When multiple modules could
 be compiled next, ties are broken by a race to claim a semaphore.

 This ticket is about improving this parallelism, both by making it more
 finely-grained, and by more intelligently scheduling the next module to
 compile. This allows, for example, for code generation of modules compiled
 early to be delayed, so that other modules can be typechecked and
 simplified, unblocking their dependencies. I have a patch implementing the
 design described below, and I hope to have it on phab this week. It will
 be very much still a work in progress, however, some initial benchmarking
 suggests some quite good results, so I would very much appreciate feedback
 and help at this stage.

 The unit of operation of parUpsweep is a MakeTask, defined with some
 companion types as:
 {{{
 data BuildModule = BuildModule ModuleName IsBoot
 data MakePipeline = MP_Parse | MP_Typecheck | MP_Simplify | MP_CodeGen
 data MakeTask = MakeTask BuildModule MakePipeline
 }}}

 As a module compiles, it starts parsing, then at the completion of
 hscParse', it moves to typechecking. Then, after desugaring it moves to
 simplifying, then after hscSimplify' moves to codegen.

 Note that a module can be parsed independently of it's dependencies. It
 can, in principle (I claim!), be typechecked once it's dependencies have
 been typechecked, and be simplified once it's dependencies have been
 simplified. It can be codegened independently of it's dependencies.

 In HscEnv, we store four IO actions in the following data type,
 {{{
 data HscYield = HscYield
   { yieldParsedSource :: HscEnv -> ModSummary -> HsParsedModule -> IO ()
   , yieldTypecheckedInterface :: HscEnv -> ModSummary -> ModIface -> Maybe
 ModDetails -> IO HomeModInfo
   , yieldSimplifiedInterface :: HscEnv -> ModSummary -> ModIface -> Maybe
 ModDetails -> IO HomeModInfo
   , awaitDependencyInterfaces :: HscEnv -> ModSummary -> IO ()
   }
 }}}

 The first three members are called when a MakePipeline stage is complete.
 The fourth must be called to ensure all a module's dependent modules are
 present in the home package table. This is necessary at least during some
 codepaths in recompilation avoidance and safe haskell checking. All these
 IO actions will block, and return only once the next MakePipeline stage
 for that module has been scheduled to run. Calls to these functions are
 sprinkled through HscMain in strategic locations.

 We also change HscEnv.hsc_HPT to be an IORef HomePackageTable. I do expect
 this to be controversial, and even potentially unacceptable, and there are
 potentially alternate implementations on which I will not elaborate here.

 Note that responsibility for adding an interface to the home package table
 is on the implementer of yieldTypecheckedInterface and
 yieldSimplifiedInterface.

 In Ghc.parUpsweep we pre-compute a graph with nodes of MakeTasks and edges
 dependencies on other MakeTasks. We maintain a priority queue of MakeTasks
 whose dependencies are satisfied, where the priority is loosely an
 estimate of the work unblocked by completing that task. Whenever a
 MakeTask completes (and parUpsweep is notified via a call to a yield...
 function described above), the next MakeTask is dequeued from the priority
 queue and allowed to continue.

 There is some complication in the transition from typechecking to
 simplifying, and in ensuring the correct information is present in the
 home package table at the right time. Perhaps it is better to treat this
 as a single pipeline stage, and indeed, perhaps parsing as a separate
 pipeline stage does not pull its weight. I believe the split between
 simplify and codegen is where the majority of the benefit lies.
 Nevertheless, as is, it is working, so I must at least offer the full idea
 for consideration.

 I do understand that this will be difficult to understand or comment on
 without the code, and I'll endeavour to get it on phab as soon as I can,
 along with some benchmarks to motivate this change.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14095>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list