[GHC] #10680: Make Backpack order-independent (again)

GHC ghc-devs at haskell.org
Mon Jul 27 18:14:04 UTC 2015


#10680: Make Backpack order-independent (again)
-------------------------------------+-------------------------------------
        Reporter:  ezyang            |                   Owner:  ezyang
            Type:  task              |                  Status:  new
        Priority:  normal            |               Milestone:
       Component:  Package system    |                 Version:  7.11
      Resolution:                    |                Keywords:
Operating System:  Unknown/Multiple  |            Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |               Test Case:
      Blocked By:                    |                Blocking:
 Related Tickets:                    |  Differential Revisions:
-------------------------------------+-------------------------------------
Description changed by ezyang:

Old description:

> When we moved to the new `bkp` file format, we also went back to the a
> format which is order-dependent: that is to say, the order in which you
> put the declarations matters.  So if you write:
>
> {{{
> unit p where
>   module A where
>     import B
>   module B where
>     ...
> }}}
>
> this fails to type-check, GHC complaining that `B` is not in scope.  I
> did this, in part because it's what the Backpack paper described, and
> because it was "simpler" to implement.
>
> I think we should move back to an order-independent scheme, for the
> following reasons:
>
> 1. Haskell users are used to not needing pay particularly close attention
> to the ordering of their modules, and forcing people to linearize their
> module descriptions would be spectacularly disruptive with large amounts
> of modules.  So un-ordered modules are "more natural.
>
> 2. Order-independence imposes some constraints on how expressive programs
> are (with order-dependent Backpack, you can do some pretty tricky things
> by ordering things certain ways); this could simplify some aspects of
> compiler implementation and make Backpack easier to explain.
>
> 3. A particular case of (2): it seems a lot simpler UX-wise to let a user
> assume that if you import a module `M` in a unit, it doesn't matter where
> you import it: you always get the same set of identifiers brought into
> scope. Thus, the incremental results of signatures should not be visible,
> c.f. #10679
>
> The main idea is that only the surface-syntax is un-ordered: the internal
> representation of units is a DAG which we work out in an elaboration
> phase, not altogether unsimilar from what `GhcMake` computes.
>
> Here are the details:
>
> **The intermediate representation.** We translate into an intermediate
> representation which consists of a directed graph between modules,
> signatures and includes.  Edges in the graph indicate a "depends on"
> relation:
>
> 1. `include p` depends on `include q` if, for some module name `H`, `p`
> requires `H` and `q` provides `H`.
> 2. A module/signature `M` depends on `include p` if `M` imports a module
> provided or required by `p`.
> 3. A module/signature `M` depends on a module/signature `S` if `M`
> imports `S`.
> 4. An `include p` depends on a module `M` if `p` requires a module named
> `M`. (This rule is included for completeness; we are going to disallow it
> shortly.)
>
> We impose one restriction: a signature cannot depend on a home module.
> See below for how to eliminate this restriction.
>
> Rule (2) is worth remarking upon: if a module imports a signature, it
> depends-on every `include` which requires that signature, as well as the
> relevant home signature.  This could easily result in a cycle; see (2)
> for how to break these cycles.  The consequence of this, however, is that
> we can factor the graph to introduce the node for the "merge of
> signatures", which depends on each signature and include which requires
> it; we can use this opportunity to build and write out the merged
> interface file for the unit.
>
> **Elaboration.** Take a Backpack file, construct this graph, and topsort
> it into a DAG of SCCs. SCCs with a single node are compileable as before.
> SCCs with multiple nodes will have to be managed with some mutual
> recursion mechanism; see refinements for more thoughts on this.
>
> **Refinements:**
>
> 1. **Can a signature depend on a (home) module?** Imports of this kind
> require a retypecheck loop.  Consider this situation:
> {{{
> unit p where
>   signature H where
>     data T
>   module M where
>     import H
>     data S = S T
> unit q where
>   include p
>   module Q where
>     import M
>   signature H where
>     import Q
>     data T = T S
> }}}
>    Here, signature H in q depends on Q.  When we typecheck `Q`, we bring
> `M.S` into the type environment with a `TyThing` that describes the
> constructor as accepting an abstract type `T`.  However, when we
> subsequently typecheck the local signature `H`, we must refine all
> `TyThing`s of `T` with the true description (e.g. constructor
> information).  So you'll need to retypecheck `Q` (and `M`) in order to
> make sure the `TyThing` is correct.
>
> 2. **Can an include depend on a (home) module?** If the module has no
> (transitive) dependency on signatures, this is fine. However, it's easy
> to have a circular dependency.  Consider:
> {{{
> unit p where
>   signature A
>   signature B
>   module M
> unit q where
>   include p
>   module B where
>     import A
>     ...
> }}}
>    `B` depends on `p` for `p/A.hsig`; however, `p` depends on `B` because
> this module is filling a requirement. Fortunately, we can untangle this
> knot without any mutual recursion nonsense (and the attendant efficiency
> loss): `A` is just an export list, we can compute it from the abstractly
> type-checked version of `p` without instantiating `B`.
>
> 3. **Can we deal with include-include dependency cycles?** Yes! Just use
> the Backpack paper's strategy for creating a recursive unit key and
> compile the two packages `hs-boot` style. But I'm not planning on
> implementing this yet.
>
> 4. **Can we deal with signature-signature dependency cycles?** Ordered
> Backpack would have supported this:
> {{{
> unit a-sig where
>   signature A where
>     data T
> unit ab-sig where
>   include a-sig
>   signature B where
>     import A
>     data S = S T
>   signature A where
>     import B
>     data T = T S
> }}}
>    In our model, `ab-sig` has a cycle.  However, I believe any such cycle
> can be broken by creating sufficiently many units:
> {{{
> unit a-sig where
>   signature B where
>     data T
>   signature A where
>     data S = S T
> unit b-sig where
>   signature A where
>     data S
>   signature B where
>     data T = T S
> unit ab-sig where
>   include a-sig
>   include b-sig
> }}}
>    In principle, GHC could automatically break import cycles by replacing
> an import with an import of a reduced signature that simply has abstract
> type definitions. (I'm not sure this is possible for all language
> features.) This technique would also work for normal modules, assuming
> that every function is explicitly annotated with a type.

New description:

 When we moved to the new `bkp` file format, we also went back to the a
 format which is order-dependent: that is to say, the order in which you
 put the declarations matters.  So if you write:

 {{{
 unit p where
   module A where
     import B
   module B where
     ...
 }}}

 this fails to type-check, GHC complaining that `B` is not in scope.  I did
 this, in part because it's what the Backpack paper described, and because
 it was "simpler" to implement.

 I think we should move back to an order-independent scheme, for the
 following reasons:

 1. Haskell users are used to not needing pay particularly close attention
 to the ordering of their modules, and forcing people to linearize their
 module descriptions would be spectacularly disruptive with large amounts
 of modules.  So un-ordered modules are "more natural for a traditional
 Haskell user.

 2. Order-independence imposes some constraints on how expressive programs
 are (with order-dependent Backpack, you can do some pretty tricky things
 by ordering things certain ways); this could simplify some aspects of
 compiler implementation and make Backpack easier to explain.

 3. A particular case of (2): it seems a lot simpler UX-wise to let a user
 assume that if you import a module `M` in a unit, it doesn't matter where
 you import it: you always get the same set of identifiers brought into
 scope. Thus, the incremental results of signatures should not be visible,
 c.f. #10679

 The main idea is that only the surface-syntax is un-ordered: the internal
 representation of units is a DAG which we work out in an elaboration
 phase, not altogether unsimilar from what `GhcMake` computes.  An
 important auxiliary idea is that `import A` where `A` is backed by some
 signatures depends on EVERY signature in scope.

 Here are the details:

 **The intermediate representation.** We translate into an intermediate
 representation which consists of a directed graph between modules,
 signatures and includes.  Edges in the graph indicate a "depends on"
 relation:

 1. `include p` depends on `include q` if, for some module name `H`, `p`
 requires `H` and `q` provides `H`.
 2. A module/signature `M` depends on `include p` if `M` imports a module
 provided or required by `p`.
 3. A module/signature `M` depends on a module/signature `S` if `M` imports
 `S`.
 4. An `include p` depends on a module `M` if `p` requires a module named
 `M`.

 We impose one restriction: a signature cannot depend on a home module.
 (But see below for how to eliminate this restriction.)

 Rule (2) is worth remarking upon: if a module imports a signature, it
 depends-on every `include` which requires that signature, as well as the
 relevant home signature.  This could easily result in a cycle; see
 refinement 2 for how to break these cycles.  The consequence of this,
 however, is that we can factor the graph to introduce the node for the
 "merge of signatures", which depends on each signature and include which
 requires it; we can use this opportunity to build and write out the merged
 interface file for the unit which is desirable from an efficiency
 perspective.

 **Elaboration.** Take a Backpack file, construct this graph, and topsort
 it into a DAG of SCCs. SCCs with a single node are compileable as before.
 SCCs with multiple nodes will have to be managed with some mutual
 recursion mechanism; see refinements for more thoughts on this.

 **Refinements:**

 1. **Can a signature depend on a (home) module?** Imports of this kind
 require a retypecheck loop.  Consider this situation:
 {{{
 unit p where
   signature H where
     data T
   module M where
     import H
     data S = S T
 unit q where
   include p
   module Q where
     import M
   signature H where
     import Q
     data T = T S
 }}}
    Here, signature H in q depends on Q.  When we typecheck `Q`, we bring
 `M.S` into the type environment with a `TyThing` that describes the
 constructor as accepting an abstract type `T`.  However, when we
 subsequently typecheck the local signature `H`, we must refine all
 `TyThing`s of `T` with the true description (e.g. constructor
 information).  So you'll need to retypecheck `Q` (and `M`) in order to
 make sure the `TyThing` is correct.

 2. **Can an include depend on a (home) module?** If the module has no
 (transitive) dependency on signatures, this is fine. However, it's easy to
 have a circular dependency.  Consider:
 {{{
 unit p where
   signature A -- imports nothing
   signature B -- imports nothing
   module M
 unit q where
   include p
   module B where
     import A
     ...
 }}}
    `B` depends on `p` for `p/A.hsig`; however, `p` depends on `B` because
 this module is filling a requirement.  However, if we were to include the
 internal graph of `p` into `q`, the resulting graph would not have an
 cycles; so this is one possibility of how to untangle this situation.
 However, if there's still a cycle (e.g. `A` imports `B`), then you will
 need at least a retypecheck loop, and maybe `hs-boot` style compilation.
 We're not going to implement this for now.

 3. **Can we deal with include-include dependency cycles?** Yes! Just use
 the Backpack paper's strategy for creating a recursive unit key and
 compile the two packages `hs-boot` style. But I'm not planning on
 implementing this yet.

 4. **Can we deal with signature-signature dependency cycles?** Ordered
 Backpack would have supported this:
 {{{
 unit a-sig where
   signature A where
     data T
 unit ab-sig where
   include a-sig
   signature B where
     import A
     data S = S T
   signature A where
     import B
     data T = T S
 }}}
    In our model, `ab-sig` has a cycle.  However, I believe any such cycle
 can be broken by creating sufficiently many units:
 {{{
 unit a-sig where
   signature B where
     data T
   signature A where
     data S = S T
 unit b-sig where
   signature A where
     data S
   signature B where
     data T = T S
 unit ab-sig where
   include a-sig
   include b-sig
 }}}
    In principle, GHC could automatically break import cycles by replacing
 an import with an import of a reduced signature that simply has abstract
 type definitions. See #10681. (I'm not sure this is possible for all
 language features.) This technique would also work for normal modules,
 assuming that every function is explicitly annotated with a type.

--

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10680#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list