[Git][ghc/ghc][wip/torsten.schmits/parallel-depanal-downsweep] Parallelize getRootSummary computations in dep analysis downsweep
Torsten Schmits (@torsten.schmits)
gitlab at gitlab.haskell.org
Wed May 15 17:50:45 UTC 2024
Torsten Schmits pushed to branch wip/torsten.schmits/parallel-depanal-downsweep at Glasgow Haskell Compiler / GHC
Commits:
f531981a by Torsten Schmits at 2024-05-15T19:41:48+02:00
Parallelize getRootSummary computations in dep analysis downsweep
This reuses the upsweep step's infrastructure to process batches of
modules in parallel.
I benchmarked this by running `ghc -M` on two sets of 10,000 modules;
one with a linear dependency chain and the other with a binary tree.
Comparing different values for the number of modules per thread
suggested an optimum at `length targets `div` (n_cap * 2)`, with results
similar to this one (6 cores, 12 threads):
```
Benchmark 1: linear 1 jobs
Time (mean ± σ): 1.775 s ± 0.026 s [User: 1.377 s, System: 0.399 s]
Range (min … max): 1.757 s … 1.793 s 2 runs
Benchmark 2: linear 6 jobs
Time (mean ± σ): 876.2 ms ± 20.9 ms [User: 1833.2 ms, System: 518.6 ms]
Range (min … max): 856.2 ms … 898.0 ms 3 runs
Benchmark 3: linear 12 jobs
Time (mean ± σ): 793.5 ms ± 23.2 ms [User: 2318.9 ms, System: 718.6 ms]
Range (min … max): 771.9 ms … 818.0 ms 3 runs
```
Results don't differ much when the batch size is reduced to a quarter
of that, but there's significant thread scheduling overhead for a size
of 1:
```
Benchmark 1: linear 1 jobs
Time (mean ± σ): 2.611 s ± 0.029 s [User: 2.851 s, System: 0.783 s]
Range (min … max): 2.591 s … 2.632 s 2 runs
Benchmark 2: linear 6 jobs
Time (mean ± σ): 1.189 s ± 0.007 s [User: 2.707 s, System: 1.103 s]
Range (min … max): 1.184 s … 1.194 s 2 runs
Benchmark 3: linear 12 jobs
Time (mean ± σ): 1.097 s ± 0.006 s [User: 2.938 s, System: 1.300 s]
Range (min … max): 1.093 s … 1.101 s 2 runs
```
Larger batches also slightly worsen performance.
- - - - -
1 changed file:
- compiler/GHC/Driver/Make.hs
Changes:
=====================================
compiler/GHC/Driver/Make.hs
=====================================
@@ -5,6 +5,7 @@
{-# LANGUAGE ApplicativeDo #-}
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE RecordWildCards #-}
+{-# LANGUAGE LambdaCase #-}
-- -----------------------------------------------------------------------------
--
@@ -119,7 +120,7 @@ import qualified Control.Monad.Catch as MC
import Data.IORef
import Data.Maybe
import Data.Time
-import Data.List (sortOn)
+import Data.List (sortOn, unfoldr)
import Data.Bifunctor (first)
import System.Directory
import System.FilePath
@@ -239,7 +240,9 @@ depanalPartial excluded_mods allow_dup_roots = do
-- cached finder data.
liftIO $ flushFinderCaches (hsc_FC hsc_env) (hsc_unit_env hsc_env)
+ worker_limit <- liftIO $ mkWorkerLimit (hsc_dflags hsc_env)
(errs, graph_nodes) <- liftIO $ downsweep
+ worker_limit
hsc_env (mgModSummaries old_graph)
excluded_mods allow_dup_roots
let
@@ -1548,7 +1551,8 @@ type DownsweepCache = M.Map (UnitId, PkgQual, ModuleNameWithIsBoot) [Either Driv
-- The returned list of [ModSummary] nodes has one node for each home-package
-- module, plus one for any hs-boot files. The imports of these nodes
-- are all there, including the imports of non-home-package modules.
-downsweep :: HscEnv
+downsweep :: WorkerLimit
+ -> HscEnv
-> [ModSummary]
-- ^ Old summaries
-> [ModuleName] -- Ignore dependencies on these; treat
@@ -1560,9 +1564,9 @@ downsweep :: HscEnv
-- The non-error elements of the returned list all have distinct
-- (Modules, IsBoot) identifiers, unless the Bool is true in
-- which case there can be repeats
-downsweep hsc_env old_summaries excl_mods allow_dup_roots
+downsweep n_jobs hsc_env old_summaries excl_mods allow_dup_roots
= do
- (root_errs, rootSummariesOk) <- partitionWithM getRootSummary roots -- #17549
+ (root_errs, rootSummariesOk) <- rootSummariesPar n_jobs hsc_env getRootSummary roots
let root_map = mkRootMap rootSummariesOk
checkDuplicates root_map
(deps, map0) <- loopSummaries rootSummariesOk (M.empty, root_map)
@@ -1734,6 +1738,41 @@ downsweep hsc_env old_summaries excl_mods allow_dup_roots
GWIB { gwib_mod = L loc mod, gwib_isBoot = is_boot } = gwib
wanted_mod = L loc mod
+-- | Execute 'getRootSummary' for the 'Target's using the parallelism pipeline
+-- system.
+-- Create 'MakeAction's for each 'Target' that uses 'withAbstractSem' to wait
+-- for a free slot, limiting the number of concurrently computed summaries to
+-- the value of the @-j@ option or the slots allocated by the job server, if
+-- that is used.
+--
+-- The 'MakeAction' returns 'Maybe', which is not handled as an error, because
+-- 'runLoop' only sets it to 'Nothing' when an exception was thrown, so the
+-- result won't be read anyway here.
+rootSummariesPar ::
+ WorkerLimit ->
+ HscEnv ->
+ (Target -> IO (Either (UnitId, DriverMessages) ModSummary)) ->
+ [Target] ->
+ IO ([(UnitId, DriverMessages)], [ModSummary])
+rootSummariesPar n_jobs hsc_env getRootSummary targets = do
+ n_cap <- getNumCapabilities
+ let bundle_size = max 1 (length targets `div` (n_cap * 2))
+ (actions, results) <- unzip <$> mapM mk_action (bundles bundle_size targets)
+ runPipelines n_jobs hsc_env mkUnknownDiagnostic (Just (mkBatchMsg hsc_env)) actions
+ partitionEithers . concat . catMaybes <$> sequence results
+ where
+ bundles sz = unfoldr $ \case
+ [] -> Nothing
+ ts -> Just (splitAt sz ts)
+
+ mk_action ts = do
+ res_var <- liftIO newEmptyMVar
+ let
+ action = do
+ MakeEnv {compile_sem} <- ask
+ lift $ lift $ withAbstractSem compile_sem (mapM getRootSummary ts)
+ pure (MakeAction action res_var, readMVar res_var)
+
-- | This function checks then important property that if both p and q are home units
-- then any dependency of p, which transitively depends on q is also a home unit.
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/f531981a46d0ba4a5f54c662d8b0b196cfb60bfc
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/f531981a46d0ba4a5f54c662d8b0b196cfb60bfc
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240515/e3d34f6c/attachment-0001.html>
More information about the ghc-commits
mailing list