Mutually-recursive/cyclic module imports
Isaac Dupree
isaacdupree at charter.net
Fri Aug 15 09:27:16 EDT 2008
Haskell-98 specifies that module import cycles work
automatically with cross-module type inference.
It has some weird interactions with defaulting and the
monomorphism restriction. In Haskell-prime we're planning
on removing artificial monomorphism, but defaulting will
still be necessary (and can still be set differently per
module).
Only JHC fully implements the recursive module imports of
Haskell-98.
GHC and NYhc each have their own proprietary "boot-files"
with slightly odd semantics to allow this to work (albeit
the syntax is simple enough)
Hugs doesn't support it at all.
I propose we simplify things and lay down some rules,
without having to invent explicit module-interface
signatures. Then I wouldn't complain(:-)) that GHC doesn't
have reasonable support for cyclic modules [1][2].
(Compiler writers will have to give feedback how plausible
this is :-) -- I think GHC and NYhc "should" be able to
adapt their boot-interface-file mechanisms to the scheme I'm
proposing..
(This is really more of a sketch than a complete proposal at
this stage.)
In particular, I propose an amount of annotation in a module
that *shall* make it compile. Compilers are free to accept
code for other reasons (e.g. .hs-boot files, or some
official module interfaces). These first proposals are
clean-ups that reflect how ridiculous people think the
current standard's module interface semantics are compared
to most languages. Also they make cross-module type
inference unnecessary, eliminating the defaulting problem.
namespace level: Haskell98 says that what a module exports
is determined by the smallest fix-point of what is possible.
I can't see a practical use for this behavior, which is
easily confusing. I think that exports that depend on the
result of a fix-point should be rejected. It can be useful
in module A to import a few types/functions explicitly from
a module B that then goes on to export the whole of module A
though.
type level: Inside any given SCC (loop) of modules, any
function imported from another member of the SCC normally
shall have an explicit type signature in the module that
exports it. (This doesn't seem a great burden, since
type-signature for top-level functions/values are considered
good practice anyway. Can anyone think of a use-case where
cross-module type inference would be particularly useful?)
Exception: imports may be given the {-# SOURCE #-} pragma.
This fulfills two purposes:
(1) It is a hint to a compiler that compiles modules
separately that the current module should be compiled before
the module being imported with {-# SOURCE #-}. Obviously,
this can make optimization worse, since it's likely that
SOURCE-imported functions won't be strictness-analyzed or
inlined or anything; but that's the .hs-boot situation
already. (And in principle even a compiler that likes
separate compilation could break individual functions down
into dependency order to compile them, adding another
tradeoff point...)
(2) If SOURCE pragmas "break the loop", then only functions
that are actually imported with SOURCE must be given type
signatures, even if module B then goes on to import module A
wholesale: example:
module A where {import {-#SOURCE#-} B (bf); ...}
module B (module A, module B) where {import A; bf :: ...; ...}
Since defining data types in logical places is an important
use of cyclic imports, I propose not to require any extra
annotation for them; the compiler will have to chase them
down and understand them in loops (how else to do it?).
However, there are some particular things to keep in mind
regarding potential recompilation:
(with a bit of a GHC bias)
Changing any orphan instances in an SCC will force the whole
thing to recompile (but what pluckiness, putting orphan
instances *there*!)
If a data type or newtype is imported without its
constructors, then the RHS changing doesn't really force a
recompile.
I imagine this could work in GHC by, for each SOURCE import,
storing the MD5 of the imported interface. Then when
checking if you seriously have to recompile module A, you
don't have to if none of those MD5s have changed and none of
the non-SOURCE-imported modules' interface MD5s have either.
In module cycles that aren't explicitly broken by SOURCEs,
GHC (or any compiler) should just insert an implicit SOURCE
for *all* cyclic imports (and possibly emit a warning)
(unless the compiler wants to guess which SOURCES are better
for optimization?). Presumably compilers that can do
separate as well as non-separate compilation could take an
optimization flag that tells them to compile cycles together
as one piece rather than obeying the SOURCES for
recompilation efficiency.
so what does the compiler have to look at in a
SOURCE-imported modules?
In the case of the proposed SOURCE imports without hs-boot
files, GHC would move from calculating one interface(md5)
per module (or two interfaces in the case of .hs-boots), to
one-per-import. I think this is, in principle, an
advantage, although it does require more re-scanning when
files are changed (only lexer/parser/renamer/module-chaser
work). For example, I've found myself adding to .hs-boot
files for the purpose of one module that SOURCE-imports the
.hs-boot, which forces the recompile of another module that
happens to depend on the .hs-boot too. To replicate the
current GHC .hs-boot behavior (in which the
hash-recalculation is shared among SOURCE-importers), one
could replace a X.hs-boot file with an X_boot.hs file that
contains:
module X_boot (module X) where
import {-# SOURCE #-} X (list of things exported
by the old .hs-boot file)
, and in other modules, replace
import {-# SOURCE #-} X (....)
with
import X_boot (....)
Taking .hs-boot docs as a guide [2], the compiler must look
in SOURCE-imported modules for:
- if an import list is given explicitly, `B (....)` not `B
hiding (....)` or `B`, the export list only needs to be
*checked* to make sure it exports the requested things, not
remembered. Exception: data or class imported with
`Name(..)` must remember exactly which constructors/members
were exported. It's recommended to specify exactly what
you're importing.
- function type signatures
- imports of functions, types, etc. If it's imported from
outside the SCC, it doesn't need a type signature/whatever.
If it's defined somewhere within the SCC, it generally does
need a type signature.
- fixity declarations, which only have to be imported in
conjunction with the corresponding
functions/constructors/whatever
- data type / newtype declarations. When no constructor is
imported, only the *kind* of the data type needs to be
recorded, which might have to involve inference on the RHS
(possibly involving more import chasing) if there aren't
explicit kind annotations for *every* type parameter.
- type synonym declarations. The whole thing has to be
imported, including RHS.
- classes. Including superclasses, class-method signatures,
and default methods? Is there some way that GHC manages to
allow not declaring all of these in .hs-boots?
- instances, whether generated by 'deriving', 'deriving
instance', or ordinary 'instance'; everything before the
"where" clause of 'instance's is relevant. But an instance
is only relevant if it's orphan, or if goes with a data or
class that's also being imported.
- the compiler-specific RULES pragmas probably follow
similar mandates as above for instances and for the
functions referenced in the RULE.
[1] my official "complaint":
http://hackage.haskell.org/trac/ghc/ticket/1409
[2] the GHC .hs-boot docs:
http://www.haskell.org/ghc/docs/latest/html/users_guide/separate-compilation.html#mutual-recursion
More information about the Haskell-prime
mailing list