[GHC] #10679: Generalize hi-boot/hi for signatures, to manage intermediate merged interfaces
GHC
ghc-devs at haskell.org
Fri Jul 24 01:18:02 UTC 2015
#10679: Generalize hi-boot/hi for signatures, to manage intermediate merged
interfaces
-------------------------------------+-------------------------------------
Reporter: ezyang | Owner: ezyang
Type: task | Status: new
Priority: normal | Milestone:
Component: Package | Version: 7.11
system |
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Revisions: |
-------------------------------------+-------------------------------------
== The problem ==
In some situations, we need to output multiple interface
files for what is morally the same module name.
=== Example 1: Merging external and home signatures ===
{{{
unit a-sig where
signature A
unit p where
include a-sig
signature A
}}}
Compiling `p/A.hsig` produces an interface file which contains just
the definitions declared in `p`. However, someone including `p`
should see the merge of the interface of `p/A.hsig` AND `a-sig/A.hsig`
(which was included.)
=== Example 2: Merging two home signatures ===
{{{
unit p where
signature A
signature B where
import A
...
signature A where
import B
...
}}}
What should we do if a signature is specified multiple times in the same
unit? The compilation of each produces a distinct interface, and the
public interface we want to expose is the merge of the two. (And by the
way, what's the source file name of `A`, if we are not using the inline
syntax?)
=== Example 3: Merging a signature and a module ===
{{{
unit p where
signature A
module B where
import A
...
module A where
import B
...
}}}
`A` and `B` are mutually recursive, and we want to use a signature file to
break the gap. The signature produces an interface file, only to be
overwritten when we actually define the module proper.
But wait! We have a solution for this example already: the first interface
file for `A` is not saved to `A.hi`, but `A.hi-boot`...
== The proposal ==
I want to take the `A.hi-boot` versus `A.hi` distinction and
generalize it: we should be able to name intermediate interface
files A.1.hi, A.2.hi, ... and finally A.hi (which
is publically visible outside the unit.) This naming convention applies
to Haskell files too.
=== User-visible consequences ===
Every signature file is numbered, and every import of a signature file
refers to a specific number. This number is unique among all other
modules in a unit which share the same name. For backwards
compatibility, some number/file name extensions are treated specially:
1. `.hs` files compile to `.hi` (implicitly numbered 0)
2. `.hs-boot` files compile to `.hi-boot` (implicitly numbered 1)
3. `.hsig` files compile to `.hi-boot` (implicitly numbered 1)
4. `.n.hsig` files compile to `.n.hi-boot` (numbered n, where n is greater
than 1)
**Flex point:** We could give `.hsig` files their own file extension
for interface files; just would require some more work to distinguish
between `hs-boot` and `hsig` as well as record the numbering.
To import, the `{-# SOURCE n #-}` pragma can be used (with `{-# SOURCE
#-}`
being equivalent `{-# SOURCE 1 #-}`.)
Inline Backpack files can omit numbering, since we can figure it out
based on the ordering of declarations (numbering in REVERSE order
of occurrence). Example 2 can be numbered as follows:
{{{
signature {-# SOURCE 2 #-} A
signature {-# SOURCE 1 #-} B where
import {-# SOURCE 2 #-} A
...
signature {-# SOURCE 1 #-} A where
import {-# SOURCE 1 #-} B
...
}}}
=== Internal consequences ===
In many places in the code today, we record a boolean indicating if
we depended on the boot interface `hi-boot` or the normal interface `hi`.
We now replace this marker with an integer which records the numbering.
The primary affected components are dependency recording in interfaces,
interface loading code in GHC, and the implementation of `--make`.
=== Interaction with signature merging ===
Unlike `hs-boot` files, `hsig` files can be included from external
units, in which case the semantics are that all signatures in scope
are merged together. The key rule is that we **generate an hi
file for each partial merge**; this means that whenever we want
to typecheck a module, there is exactly one interface file per
module we import. Consider this example:
{{{
unit a-sig where
signature A
unit a-sig2 where
signature A
unit p where
include a-sig
module B
include a-sig2
module C
signature A
module D
}}}
When compiling this, we generate four interface files for `A`:
{{{
unit p where
include a-sig
-- Produces A.3.hi-boot (a-sig)
module B -- uses A.3.hi-boot
include a-sig2
-- Produces A.2.hi-boot (a-sig + a-sig2)
module C -- uses A.2.hi-boot
signature A
-- Produces A.hi-boot (everything)
module D -- uses A.hi-boot
-- At the end, A.hi-boot copied to A.hi to be publically visible
}}}
== Can we do anything simpler? ==
There are a few barriers to doing something simpler:
1. We can avoid generating extra interface files if we instead merge them
on-the-fly when we use them. However, this forces later instances of GHC
to do repeated work remerging interface files, so it seems desirable from
a performance perspective to merge before writing. Another scheme is that
we could merge on use for signatures in the home package, and then write
out a unified file at the very end, trading off performance for less
written interface files.
2. The Backpack language is defined in a way that allows modules,
signatures and includes to be ordered in a semantically meaningful way.
For example:
{{{
unit q where
signature M
signature A where
f :: Int -> Int
...
unit p where
signature A where
data T
module M where
import A -- should get T but not f
...
include q -- fill in M
module S where
import A -- should get T and f
}}}
This means that even within a unit, the interface of a signature file
may differ. We could rule this out, but we would have to work out how to
explain this limitation to users. (For example, we could solve the
example above by saying that units which define modules do not bring their
signatures into scope for a package which imports them; but this is a
pretty ad hoc rule! And you still have to deal with repeated signatures,
or a signature importing a module importing a signature. There are a lot
of cases.)
3. This problem cannot be avoided at all if you are truly doing recursive
modules, since you need the intermediate interface file to do compilation
at all prior to getting the real implementation.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10679>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list