[GHC] #10679: Generalize hi-boot/hi for signatures, to manage intermediate merged interfaces

GHC ghc-devs at haskell.org
Fri Jul 24 01:18:02 UTC 2015


#10679: Generalize hi-boot/hi for signatures, to manage intermediate merged
interfaces
-------------------------------------+-------------------------------------
              Reporter:  ezyang      |             Owner:  ezyang
                  Type:  task        |            Status:  new
              Priority:  normal      |         Milestone:
             Component:  Package     |           Version:  7.11
  system                             |
              Keywords:              |  Operating System:  Unknown/Multiple
          Architecture:              |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
             Test Case:              |        Blocked By:
              Blocking:              |   Related Tickets:
Differential Revisions:              |
-------------------------------------+-------------------------------------
 == The problem ==

 In some situations, we need to output multiple interface
 files for what is morally the same module name.

 === Example 1: Merging external and home signatures ===

 {{{
 unit a-sig where
     signature A
 unit p where
     include a-sig
     signature A
 }}}

 Compiling `p/A.hsig` produces an interface file which contains just
 the definitions declared in `p`.  However, someone including `p`
 should see the merge of the interface of `p/A.hsig` AND `a-sig/A.hsig`
 (which was included.)

 === Example 2: Merging two home signatures ===

 {{{
 unit p where
     signature A
     signature B where
         import A
         ...
     signature A where
         import B
         ...
 }}}

 What should we do if a signature is specified multiple times in the same
 unit?  The compilation of each produces a distinct interface, and the
 public interface we want to expose is the merge of the two. (And by the
 way, what's the source file name of `A`, if we are not using the inline
 syntax?)

 === Example 3: Merging a signature and a module ===

 {{{
 unit p where
     signature A
     module B where
         import A
         ...
     module A where
         import B
         ...
 }}}

 `A` and `B` are mutually recursive, and we want to use a signature file to
 break the gap.  The signature produces an interface file, only to be
 overwritten when we actually define the module proper.

 But wait! We have a solution for this example already: the first interface
 file for `A` is not saved to `A.hi`, but `A.hi-boot`...

 == The proposal ==

 I want to take the `A.hi-boot` versus `A.hi` distinction and
 generalize it: we should be able to name intermediate interface
 files A.1.hi, A.2.hi, ... and finally A.hi (which
 is publically visible outside the unit.)  This naming convention applies
 to Haskell files too.

 === User-visible consequences ===

 Every signature file is numbered, and every import of a signature file
 refers to a specific number.  This number is unique among all other
 modules in a unit which share the same name.  For backwards
 compatibility, some number/file name extensions are treated specially:

 1. `.hs` files compile to `.hi` (implicitly numbered 0)

 2. `.hs-boot` files compile to `.hi-boot` (implicitly numbered 1)

 3. `.hsig` files compile to `.hi-boot` (implicitly numbered 1)

 4. `.n.hsig` files compile to `.n.hi-boot` (numbered n, where n is greater
 than 1)

 **Flex point:** We could give `.hsig` files their own file extension
 for interface files; just would require some more work to distinguish
 between `hs-boot` and `hsig` as well as record the numbering.

 To import, the `{-# SOURCE n #-}` pragma can be used (with `{-# SOURCE
 #-}`
 being equivalent `{-# SOURCE 1 #-}`.)

 Inline Backpack files can omit numbering, since we can figure it out
 based on the ordering of declarations (numbering in REVERSE order
 of occurrence).  Example 2 can be numbered as follows:

 {{{
     signature {-# SOURCE 2 #-} A
     signature {-# SOURCE 1 #-} B where
         import {-# SOURCE 2 #-} A
         ...
     signature {-# SOURCE 1 #-} A where
         import {-# SOURCE 1 #-} B
         ...
 }}}

 === Internal consequences ===

 In many places in the code today, we record a boolean indicating if
 we depended on the boot interface `hi-boot` or the normal interface `hi`.
 We now replace this marker with an integer which records the numbering.
 The primary affected components are dependency recording in interfaces,
 interface loading code in GHC, and the implementation of `--make`.

 === Interaction with signature merging ===

 Unlike `hs-boot` files, `hsig` files can be included from external
 units, in which case the semantics are that all signatures in scope
 are merged together.  The key rule is that we **generate an hi
 file for each partial merge**; this means that whenever we want
 to typecheck a module, there is exactly one interface file per
 module we import. Consider this example:

 {{{
 unit a-sig where
     signature A
 unit a-sig2 where
     signature A
 unit p where
     include a-sig
     module B
     include a-sig2
     module C
     signature A
     module D
 }}}

 When compiling this, we generate four interface files for `A`:

 {{{
 unit p where
     include a-sig
     -- Produces A.3.hi-boot (a-sig)
     module B -- uses A.3.hi-boot
     include a-sig2
     -- Produces A.2.hi-boot (a-sig + a-sig2)
     module C -- uses A.2.hi-boot
     signature A
     -- Produces A.hi-boot (everything)
     module D -- uses A.hi-boot
     -- At the end, A.hi-boot copied to A.hi to be publically visible
 }}}

 == Can we do anything simpler? ==

 There are a few barriers to doing something simpler:

 1. We can avoid generating extra interface files if we instead merge them
 on-the-fly when we use them.  However, this forces later instances of GHC
 to do repeated work remerging interface files, so it seems desirable from
 a performance perspective to merge before writing.  Another scheme is that
 we could merge on use for signatures in the home package, and then write
 out a unified file at the very end, trading off performance for less
 written interface files.

 2. The Backpack language is defined in a way that allows modules,
 signatures and includes to be ordered in a semantically meaningful way.
 For example:
 {{{
 unit q where
     signature M
     signature A where
         f :: Int -> Int
     ...
 unit p where
     signature A where
         data T
     module M where
         import A -- should get T but not f
         ...
     include q -- fill in M
     module S where
         import A -- should get T and f
 }}}
    This means that even within a unit, the interface of a signature file
 may differ.  We could rule this out, but we would have to work out how to
 explain this limitation to users.  (For example, we could solve the
 example above by saying that units which define modules do not bring their
 signatures into scope for a package which imports them; but this is a
 pretty ad hoc rule!  And you still have to deal with repeated signatures,
 or a signature importing a module importing a signature.  There are a lot
 of cases.)

 3. This problem cannot be avoided at all if you are truly doing recursive
 modules, since you need the intermediate interface file to do compilation
 at all prior to getting the real implementation.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10679>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list