[GHC] #13493: Recompilation avoidance and Backpack
GHC
ghc-devs at haskell.org
Tue Mar 28 16:49:05 UTC 2017
#13493: Recompilation avoidance and Backpack
-------------------------------------+-------------------------------------
Reporter: ezyang | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.1
Keywords: recomp | Operating System: Unknown/Multiple
backpack |
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
Today, recompilation avoidance is centered around two major mechanisms:
1. First, we keep track of entities we *use* (`tcg_dus`), which is done by
reading off all external names from the renamed source code of a Haskell
source file.
2. Second, we keep track of what we *import* (`tcg_imports`), which
tracked when we rename imports.
These two pieces of information get assembled into a module-indexed series
of usages in `mk_mod_usage_info`. The general idea is that when an entity
is used, we must record the hash of the entity; when a module is imported,
we must record its export hash.
There is an implicit assumption here, which is that a (direct) import is
the only way we depend on the exports of a module, and an occurrence of a
name in the renamed syntax is the only way we depend on an actual entity.
Backpack breaks these assumptions:
* When we perform signature merging, we depend on the exports and entities
of each of the signatures we merge in. Furthermore, it is important to
distinguish each of these by identity module (not semantic module, which
collapses the distinction.)
* When we instantiate a module, we depend on the exports and entities of
the implementing module.
When I initially implemented Backpack, I slowly added extra information to
fix recompilation problems as I noticed them. I thus accreted the
following recompilation avoidance mechanisms:
* When signature merging occurs, we specially record the module hash for
each used merge requirement as a special new field
`UsageMergedRequirement`, and recomp if the module hash changed at all. We
also add each merged signature to ImportAvails (but not as an "import") to
ensure we pick up family instances.
* When we instantiate a module, we treat it as if we had a direct import
of it (not yet merged, in https://phabricator.haskell.org/D3381). Since
instantiations are always referencing non-local modules, we'll always
record a module hash in such cases.
This is quite a hodgepodge, and I have no confidence that it is correct.
For example, if an implementing module reexports an entity from another
module, and that original entity changes, I doubt we recompile at this
point. We "accidentally" handle the case when it's not a reexport because
we record the module hash for the entire instantiating module.
It seems that it would be better if we can recast this in terms of imports
and usages. Here is a try at the design:
* In both instantiation and merging, we must record the export hash of the
modules we instantiated/merged in. It is a little troublesome to think of
these as imports, however, because they're not (and if you try to
implement this, you find yourself making a fake ImportedModVal for an
import that doesn't exist); I think the correct thing here is to introduce
a new notion of dependency for things that don't correspond to source
level imports (another possibility is to add another constructor to
ImportedModVal but the effect of this on existing code would have to be
determined.)
* The usages when we instantiate a signature are the (instantiated) usages
of the original signature (in particular, this picks up the usages from
instance lookup), plus a usage for each entity that we match against
(because we must rematch if the type changes.)
* Usages for signature merging are a little trickier. We want a usage for
every entity that we end up merging in (so, we must record usages post
thinning), BUT we must make sure the usage points at the identity module
of the signature that originally provided it, not the semantic module
(which will invariably point to the current module under compilation.)
One more thing: when we instantiate a module on-the-fly, we need to
account for how we instantiated it (to put it differently, the
recompilation information we compute when we do on-the-fly should be the
(morally) the same as what we would get if we actually compiled the
modules in question. This is a bit troublesome since we don't have
detailed information relating how a signature was instantiated and what we
used (the on-the-fly instantiation process shortcuts this). The simplest
thing is probably to just record the module hashes of each module that was
used to instantiate an imported module (recursively); we might be able to
do this even by just twiddling `mi_mod_hash` hash when we instantiate (the
alternative is to switch to recording InstalledModule/InstalledUnitId only
in hashes, and augmenting usage information to also carry along
instantiations.)
Another problem is that we record usages for Module (instantiated things),
but hashes are actually on an InstalledModule basis.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/13493>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list