[Git][ghc/ghc][wip/torsten.schmits/parallel-depanal-downsweep] Parallelize getRootSummary computations in dep analysis downsweep

Torsten Schmits (@torsten.schmits) gitlab at gitlab.haskell.org
Wed May 15 17:50:45 UTC 2024



Torsten Schmits pushed to branch wip/torsten.schmits/parallel-depanal-downsweep at Glasgow Haskell Compiler / GHC


Commits:
f531981a by Torsten Schmits at 2024-05-15T19:41:48+02:00
Parallelize getRootSummary computations in dep analysis downsweep

This reuses the upsweep step's infrastructure to process batches of
modules in parallel.

I benchmarked this by running `ghc -M` on two sets of 10,000 modules;
one with a linear dependency chain and the other with a binary tree.
Comparing different values for the number of modules per thread
suggested an optimum at `length targets `div` (n_cap * 2)`, with results
similar to this one (6 cores, 12 threads):

```
Benchmark 1: linear 1 jobs
  Time (mean ± σ):      1.775 s ±  0.026 s    [User: 1.377 s, System: 0.399 s]
  Range (min … max):    1.757 s …  1.793 s    2 runs

Benchmark 2: linear 6 jobs
  Time (mean ± σ):     876.2 ms ±  20.9 ms    [User: 1833.2 ms, System: 518.6 ms]
  Range (min … max):   856.2 ms … 898.0 ms    3 runs

Benchmark 3: linear 12 jobs
  Time (mean ± σ):     793.5 ms ±  23.2 ms    [User: 2318.9 ms, System: 718.6 ms]
  Range (min … max):   771.9 ms … 818.0 ms    3 runs
```

Results don't differ much when the batch size is reduced to a quarter
of that, but there's significant thread scheduling overhead for a size
of 1:

```
Benchmark 1: linear 1 jobs
  Time (mean ± σ):      2.611 s ±  0.029 s    [User: 2.851 s, System: 0.783 s]
  Range (min … max):    2.591 s …  2.632 s    2 runs

Benchmark 2: linear 6 jobs
  Time (mean ± σ):      1.189 s ±  0.007 s    [User: 2.707 s, System: 1.103 s]
  Range (min … max):    1.184 s …  1.194 s    2 runs

Benchmark 3: linear 12 jobs
  Time (mean ± σ):      1.097 s ±  0.006 s    [User: 2.938 s, System: 1.300 s]
  Range (min … max):    1.093 s …  1.101 s    2 runs
```

Larger batches also slightly worsen performance.

- - - - -


1 changed file:

- compiler/GHC/Driver/Make.hs


Changes:

=====================================
compiler/GHC/Driver/Make.hs
=====================================
@@ -5,6 +5,7 @@
 {-# LANGUAGE ApplicativeDo #-}
 {-# LANGUAGE MultiWayIf #-}
 {-# LANGUAGE RecordWildCards #-}
+{-# LANGUAGE LambdaCase #-}
 
 -- -----------------------------------------------------------------------------
 --
@@ -119,7 +120,7 @@ import qualified Control.Monad.Catch as MC
 import Data.IORef
 import Data.Maybe
 import Data.Time
-import Data.List (sortOn)
+import Data.List (sortOn, unfoldr)
 import Data.Bifunctor (first)
 import System.Directory
 import System.FilePath
@@ -239,7 +240,9 @@ depanalPartial excluded_mods allow_dup_roots = do
     -- cached finder data.
     liftIO $ flushFinderCaches (hsc_FC hsc_env) (hsc_unit_env hsc_env)
 
+    worker_limit <- liftIO $ mkWorkerLimit (hsc_dflags hsc_env)
     (errs, graph_nodes) <- liftIO $ downsweep
+      worker_limit
       hsc_env (mgModSummaries old_graph)
       excluded_mods allow_dup_roots
     let
@@ -1548,7 +1551,8 @@ type DownsweepCache = M.Map (UnitId, PkgQual, ModuleNameWithIsBoot) [Either Driv
 -- The returned list of [ModSummary] nodes has one node for each home-package
 -- module, plus one for any hs-boot files.  The imports of these nodes
 -- are all there, including the imports of non-home-package modules.
-downsweep :: HscEnv
+downsweep :: WorkerLimit
+          -> HscEnv
           -> [ModSummary]
           -- ^ Old summaries
           -> [ModuleName]       -- Ignore dependencies on these; treat
@@ -1560,9 +1564,9 @@ downsweep :: HscEnv
                 -- The non-error elements of the returned list all have distinct
                 -- (Modules, IsBoot) identifiers, unless the Bool is true in
                 -- which case there can be repeats
-downsweep hsc_env old_summaries excl_mods allow_dup_roots
+downsweep n_jobs hsc_env old_summaries excl_mods allow_dup_roots
    = do
-       (root_errs, rootSummariesOk) <- partitionWithM getRootSummary roots -- #17549
+       (root_errs, rootSummariesOk) <- rootSummariesPar n_jobs hsc_env getRootSummary roots
        let root_map = mkRootMap rootSummariesOk
        checkDuplicates root_map
        (deps, map0) <- loopSummaries rootSummariesOk (M.empty, root_map)
@@ -1734,6 +1738,41 @@ downsweep hsc_env old_summaries excl_mods allow_dup_roots
             GWIB { gwib_mod = L loc mod, gwib_isBoot = is_boot } = gwib
             wanted_mod = L loc mod
 
+-- | Execute 'getRootSummary' for the 'Target's using the parallelism pipeline
+-- system.
+-- Create 'MakeAction's for each 'Target' that uses 'withAbstractSem' to wait
+-- for a free slot, limiting the number of concurrently computed summaries to
+-- the value of the @-j@ option or the slots allocated by the job server, if
+-- that is used.
+--
+-- The 'MakeAction' returns 'Maybe', which is not handled as an error, because
+-- 'runLoop' only sets it to 'Nothing' when an exception was thrown, so the
+-- result won't be read anyway here.
+rootSummariesPar ::
+  WorkerLimit ->
+  HscEnv ->
+  (Target -> IO (Either (UnitId, DriverMessages) ModSummary)) ->
+  [Target] ->
+  IO ([(UnitId, DriverMessages)], [ModSummary])
+rootSummariesPar n_jobs hsc_env getRootSummary targets = do
+  n_cap <- getNumCapabilities
+  let bundle_size = max 1 (length targets `div` (n_cap * 2))
+  (actions, results) <- unzip <$> mapM mk_action (bundles bundle_size targets)
+  runPipelines n_jobs hsc_env mkUnknownDiagnostic (Just (mkBatchMsg hsc_env)) actions
+  partitionEithers . concat . catMaybes <$> sequence results
+  where
+    bundles sz = unfoldr $ \case
+      [] -> Nothing
+      ts -> Just (splitAt sz ts)
+
+    mk_action ts = do
+      res_var <- liftIO newEmptyMVar
+      let
+        action = do
+          MakeEnv {compile_sem} <- ask
+          lift $ lift $ withAbstractSem compile_sem (mapM getRootSummary ts)
+      pure (MakeAction action res_var, readMVar res_var)
+
 -- | This function checks then important property that if both p and q are home units
 -- then any dependency of p, which transitively depends on q is also a home unit.
 --



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/f531981a46d0ba4a5f54c662d8b0b196cfb60bfc

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/f531981a46d0ba4a5f54c662d8b0b196cfb60bfc
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240515/e3d34f6c/attachment-0001.html>


More information about the ghc-commits mailing list