[commit: ghc] master: Compute the union of imp_finsts on the side (d2511e3)

git at git.haskell.org git at git.haskell.org
Tue Feb 6 17:58:48 UTC 2018


Repository : ssh://git@git.haskell.org/ghc

On branch  : master
Link       : http://ghc.haskell.org/trac/ghc/changeset/d2511e3b61563ed3fc2c9aec2c90a4156373a24c/ghc

>---------------------------------------------------------------

commit d2511e3b61563ed3fc2c9aec2c90a4156373a24c
Author: Bartosz Nitka <niteria at gmail.com>
Date:   Fri Feb 2 06:55:19 2018 -0800

    Compute the union of imp_finsts on the side
    
    I've explained most of the rationale in a new Note.
    I'd happily add a test for this, but the difference is only
    visible in run time, allocations remain more or less the same.
    
    FWIW running `generateModules` from #14693 with DEPTH=16, WIDTH=30
    finishes in `23s` before, and `11s` after.
    
    Test Plan: ./validate
    
    Reviewers: simonpj, simonmar, bgamari
    
    Reviewed By: simonpj
    
    Subscribers: rwbarton, thomie, carter
    
    GHC Trac Issues: #14693
    
    Differential Revision: https://phabricator.haskell.org/D4369


>---------------------------------------------------------------

d2511e3b61563ed3fc2c9aec2c90a4156373a24c
 compiler/rename/RnNames.hs | 67 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 61 insertions(+), 6 deletions(-)

diff --git a/compiler/rename/RnNames.hs b/compiler/rename/RnNames.hs
index ae3f75b..769b34e 100644
--- a/compiler/rename/RnNames.hs
+++ b/compiler/rename/RnNames.hs
@@ -177,16 +177,71 @@ rnImports imports = do
     return (decls, rdr_env, imp_avails, hpc_usage)
 
   where
+    -- See Note [Combining ImportAvails]
     combine :: [(LImportDecl GhcRn,  GlobalRdrEnv, ImportAvails, AnyHpcUsage)]
             -> ([LImportDecl GhcRn], GlobalRdrEnv, ImportAvails, AnyHpcUsage)
-    combine = foldr plus ([], emptyGlobalRdrEnv, emptyImportAvails, False)
-
-    plus (decl,  gbl_env1, imp_avails1,hpc_usage1)
-         (decls, gbl_env2, imp_avails2,hpc_usage2)
+    combine ss =
+      let (decls, rdr_env, imp_avails, hpc_usage, finsts) = foldr
+            plus
+            ([], emptyGlobalRdrEnv, emptyImportAvails, False, emptyModuleSet)
+            ss
+      in (decls, rdr_env, imp_avails { imp_finsts = moduleSetElts finsts },
+            hpc_usage)
+
+    plus (decl,  gbl_env1, imp_avails1, hpc_usage1)
+         (decls, gbl_env2, imp_avails2, hpc_usage2, finsts_set)
       = ( decl:decls,
           gbl_env1 `plusGlobalRdrEnv` gbl_env2,
-          imp_avails1 `plusImportAvails` imp_avails2,
-          hpc_usage1 || hpc_usage2 )
+          imp_avails1' `plusImportAvails` imp_avails2,
+          hpc_usage1 || hpc_usage2,
+          extendModuleSetList finsts_set new_finsts )
+      where
+      imp_avails1' = imp_avails1 { imp_finsts = [] }
+      new_finsts = imp_finsts imp_avails1
+
+{-
+Note [Combine ImportAvails]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+imp_finsts in ImportAvails is a list of family instance modules
+transitively depended on by an import. imp_finsts for a currently
+compiled module is a union of all the imp_finsts of imports.
+Computing the union of two lists of size N is O(N^2) and if we
+do it to M imports we end up with O(M*N^2). That can get very
+expensive for bigger module hierarchies.
+
+Union can be optimized to O(N log N) if we use a Set.
+imp_finsts is converted back and forth between dep_finsts, so
+changing a type of imp_finsts means either paying for the conversions
+or changing the type of dep_finsts as well.
+
+I've measured that the conversions would cost 20% of allocations on my
+test case, so that can be ruled out.
+
+Changing the type of dep_finsts forces checkFamInsts to
+get the module lists in non-deterministic order. If we wanted to restore
+the deterministic order, we'd have to sort there, which is an additional
+cost. As far as I can tell, using a non-deterministic order is fine there,
+but that's a brittle nonlocal property which I'd like to avoid.
+
+Additionally, dep_finsts is read from an interface file, so its "natural"
+type is a list. Which makes it a natural type for imp_finsts.
+
+Since rnImports.combine is really the only place that would benefit from
+it being a Set, it makes sense to optimize the hot loop in rnImports.combine
+without changing the representation.
+
+So here's what we do: instead of naively merging ImportAvails with
+plusImportAvails in a loop, we make plusImportAvails merge empty imp_finsts
+and compute the union on the side using Sets. When we're done, we can
+convert it back to a list. One nice side effect of this approach is that
+if there's a lot of overlap in the imp_finsts of imports, the
+Set doesn't really need to grow and we don't need to allocate.
+
+Running generateModules from Trac #14693 with DEPTH=16, WIDTH=30 finishes in
+23s before, and 11s after.
+-}
+
+
 
 -- | Given a located import declaration @decl@ from @this_mod@,
 -- calculate the following pieces of information:



More information about the ghc-commits mailing list