[GHC] #10680: Make Backpack order-independent (again)
GHC
ghc-devs at haskell.org
Fri Jul 31 04:21:08 UTC 2015
#10680: Make Backpack order-independent (again)
-------------------------------------+-------------------------------------
Reporter: ezyang | Owner: ezyang
Type: task | Status: new
Priority: normal | Milestone:
Component: Package system | Version: 7.11
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions:
-------------------------------------+-------------------------------------
Description changed by ezyang:
Old description:
> When we moved to the new `bkp` file format, we also went back to the a
> format which is order-dependent: that is to say, the order in which you
> put the declarations matters. So if you write:
>
> {{{
> unit p where
> module A where
> import B
> module B where
> ...
> }}}
>
> this fails to type-check, GHC complaining that `B` is not in scope. I
> did this, in part because it's what the Backpack paper described, and
> because it was "simpler" to implement.
>
> I think we should move back to an order-independent scheme, for the
> following reasons:
>
> 1. Haskell users are used to not needing pay particularly close attention
> to the ordering of their modules, and forcing people to linearize their
> module descriptions would be spectacularly disruptive with large amounts
> of modules. So un-ordered modules are "more natural for a traditional
> Haskell user.
>
> 2. Order-independence imposes some constraints on how expressive programs
> are (with order-dependent Backpack, you can do some pretty tricky things
> by ordering things certain ways); this could simplify some aspects of
> compiler implementation and make Backpack easier to explain.
>
> 3. A particular case of (2): it seems a lot simpler UX-wise to let a user
> assume that if you import a module `M` in a unit, it doesn't matter where
> you import it: you always get the same set of identifiers brought into
> scope. Thus, the incremental results of signatures should not be visible,
> c.f. #10679
>
> The main idea is that only the surface-syntax is un-ordered: the internal
> representation of units is a DAG which we work out in an elaboration
> phase, not altogether unsimilar from what `GhcMake` computes. An
> important auxiliary idea is that `import A` where `A` is backed by some
> signatures depends on EVERY signature in scope.
>
> Here are the details:
>
> **The intermediate representation.** We translate into an intermediate
> representation which consists of a directed graph between modules,
> signatures and includes. Edges in the graph indicate a "depends on"
> relation:
>
> 1. `include p` depends on `include q` if, for some module name `H`, `p`
> requires `H` and `q` provides `H`.
> 2. A module/signature `M` depends on `include p` if `M` imports a module
> provided or required by `p`.
> 3. A module/signature `M` depends on a module/signature `S` if `M`
> imports `S`.
> 4. An `include p` depends on a module `M` if `p` requires a module named
> `M`.
>
> We impose one restriction: a signature cannot depend on a home module.
> (But see below for how to eliminate this restriction.)
>
> Rule (2) is worth remarking upon: if a module imports a signature, it
> depends-on every `include` which requires that signature, as well as the
> relevant home signature. This could easily result in a cycle; see
> refinement 2 for how to break these cycles. The consequence of this,
> however, is that we can factor the graph to introduce the node for the
> "merge of signatures", which depends on each signature and include which
> requires it; we can use this opportunity to build and write out the
> merged interface file for the unit which is desirable from an efficiency
> perspective.
>
> **Elaboration.** Take a Backpack file, construct this graph, and topsort
> it into a DAG of SCCs. SCCs with a single node are compileable as before.
> SCCs with multiple nodes will have to be managed with some mutual
> recursion mechanism; see refinements for more thoughts on this.
>
> **Refinements:**
>
> 1. **Can a signature depend on a (home) module?** Imports of this kind
> require a retypecheck loop. Consider this situation:
> {{{
> unit p where
> signature H where
> data T
> module M where
> import H
> data S = S T
> unit q where
> include p
> module Q where
> import M
> signature H where
> import Q
> data T = T S
> }}}
> Here, signature H in q depends on Q. When we typecheck `Q`, we bring
> `M.S` into the type environment with a `TyThing` that describes the
> constructor as accepting an abstract type `T`. However, when we
> subsequently typecheck the local signature `H`, we must refine all
> `TyThing`s of `T` with the true description (e.g. constructor
> information). So you'll need to retypecheck `Q` (and `M`) in order to
> make sure the `TyThing` is correct.
>
> 2. **Can an include depend on a (home) module?** If the module has no
> (transitive) dependency on signatures, this is fine. However, it's easy
> to have a circular dependency. Consider:
> {{{
> unit p where
> signature A -- imports nothing
> signature B -- imports nothing
> module M
> unit q where
> include p
> module B where
> import A
> ...
> }}}
> `B` depends on `p` for `p/A.hsig`; however, `p` depends on `B` because
> this module is filling a requirement. However, if we were to include the
> internal graph of `p` into `q`, the resulting graph would not have an
> cycles; so this is one possibility of how to untangle this situation.
> However, if there's still a cycle (e.g. `A` imports `B`), then you will
> need at least a retypecheck loop, and maybe `hs-boot` style compilation.
> We're not going to implement this for now.
>
> 3. **Can we deal with include-include dependency cycles?** Yes! Just use
> the Backpack paper's strategy for creating a recursive unit key and
> compile the two packages `hs-boot` style. But I'm not planning on
> implementing this yet.
>
> 4. **Can we deal with signature-signature dependency cycles?** Ordered
> Backpack would have supported this:
> {{{
> unit a-sig where
> signature A where
> data T
> unit ab-sig where
> include a-sig
> signature B where
> import A
> data S = S T
> signature A where
> import B
> data T = T S
> }}}
> In our model, `ab-sig` has a cycle. However, I believe any such cycle
> can be broken by creating sufficiently many units:
> {{{
> unit a-sig where
> signature B where
> data T
> signature A where
> data S = S T
> unit b-sig where
> signature A where
> data S
> signature B where
> data T = T S
> unit ab-sig where
> include a-sig
> include b-sig
> }}}
> In principle, GHC could automatically break import cycles by replacing
> an import with an import of a reduced signature that simply has abstract
> type definitions. See #10681. (I'm not sure this is possible for all
> language features.) This technique would also work for normal modules,
> assuming that every function is explicitly annotated with a type.
New description:
When we moved to the new `bkp` file format, we also went back to the a
format which is order-dependent: that is to say, the order in which you
put the declarations matters. So if you write:
{{{
unit p where
module A where
import B
module B where
...
}}}
this fails to type-check, GHC complaining that `B` is not in scope. I did
this, in part because it's what the Backpack paper described, and because
it was "simpler" to implement.
I think we should move back to an order-independent scheme, for the
following reasons:
1. Haskell users are used to not needing pay particularly close attention
to the ordering of their modules, and forcing people to linearize their
module descriptions would be spectacularly disruptive with large amounts
of modules. So un-ordered modules are "more natural for a traditional
Haskell user.
2. Order-independence imposes some constraints on how expressive programs
are (with order-dependent Backpack, you can do some pretty tricky things
by ordering things certain ways); this could simplify some aspects of
compiler implementation and make Backpack easier to explain.
3. A particular case of (2): it seems a lot simpler UX-wise to let a user
assume that if you import a module `M` in a unit, it doesn't matter where
you import it: you always get the same set of identifiers brought into
scope. Thus, the incremental results of signatures should not be visible,
c.f. #10679
The main idea is that only the surface-syntax is un-ordered: the internal
representation of units is a DAG which we work out in an elaboration
phase, not altogether unsimilar from what `GhcMake` computes. An
important auxiliary idea is that `import A` where `A` is backed by some
signatures depends on EVERY signature in scope.
Here are the details:
**The intermediate representation.** We translate into an intermediate
representation which consists of a directed graph of:
• Each source-level module, signature and include, and
• Each unfilled requirement (called a “signature merge” node).
The edges of the directed graph signify a “depends on” relation, and are
defined as follows:
• An include p depends on include q if, for some module name m, p requires
m and q provides m.
• An include p depends on a module m if p requires a module named m.
• A module/signature m depends on include p if m imports a module provided
by p.
• A module/signature m depends on a module n if m imports n.
• A module/signature m depends on a signature merge n if m imports n.
• A module/signature m depends on a signature n if m {-# SOURCE #-}
imports n.
• A signature merge m depends on a local signature m (if it exists).
• A signature merge m depends on a include p, if the (renamed) include
requires m.
**Elaboration.** Take a Backpack file, construct this graph, and topsort
it into a DAG of SCCs. SCCs with a single node are compileable as before.
SCCs with multiple nodes will have to be managed with some mutual
recursion mechanism; see refinements for more thoughts on this.
**Refinements:**
1. **Can a signature depend on a (home) module?** Imports of this kind
require a retypecheck loop. Consider this situation:
{{{
unit p where
signature H where
data T
module M where
import H
data S = S T
unit q where
include p
module Q where
import M
signature H where
import Q
data T = T S
}}}
Here, signature H in q depends on Q. When we typecheck `Q`, we bring
`M.S` into the type environment with a `TyThing` that describes the
constructor as accepting an abstract type `T`. However, when we
subsequently typecheck the local signature `H`, we must refine all
`TyThing`s of `T` with the true description (e.g. constructor
information). So you'll need to retypecheck `Q` (and `M`) in order to
make sure the `TyThing` is correct.
2. **Can an include depend on a (home) module?** If the module has no
(transitive) dependency on signatures, this is fine. However, it's easy to
have a circular dependency. Consider:
{{{
unit p where
signature A -- imports nothing
signature B -- imports nothing
module M
unit q where
include p
module B where
import A
...
}}}
`B` depends on `p` for `p/A.hsig`; however, `p` depends on `B` because
this module is filling a requirement. However, if we were to include the
internal graph of `p` into `q`, the resulting graph would not have an
cycles; so this is one possibility of how to untangle this situation.
However, if there's still a cycle (e.g. `A` imports `B`), then you will
need at least a retypecheck loop, and maybe `hs-boot` style compilation.
We're not going to implement this for now.
3. **Can we deal with include-include dependency cycles?** Yes! Just use
the Backpack paper's strategy for creating a recursive unit key and
compile the two packages `hs-boot` style. But I'm not planning on
implementing this yet.
4. **Can we deal with signature-signature dependency cycles?** Ordered
Backpack would have supported this:
{{{
unit a-sig where
signature A where
data T
unit ab-sig where
include a-sig
signature B where
import A
data S = S T
signature A where
import B
data T = T S
}}}
In our model, `ab-sig` has a cycle. However, I believe any such cycle
can be broken by creating sufficiently many units:
{{{
unit a-sig where
signature B where
data T
signature A where
data S = S T
unit b-sig where
signature A where
data S
signature B where
data T = T S
unit ab-sig where
include a-sig
include b-sig
}}}
In principle, GHC could automatically break import cycles by replacing
an import with an import of a reduced signature that simply has abstract
type definitions. See #10681. (I'm not sure this is possible for all
language features.) This technique would also work for normal modules,
assuming that every function is explicitly annotated with a type.
--
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10680#comment:3>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list