Mutually-recursive/cyclic module imports

Fri Aug 15 09:27:16 EDT 2008

Haskell-98 specifies that module import cycles work 
automatically with cross-module type inference.

It has some weird interactions with defaulting and the 
monomorphism restriction.  In Haskell-prime we're planning 
on removing artificial monomorphism, but defaulting will 
still be necessary (and can still be set differently per 
module).

Only JHC fully implements the recursive module imports of 
Haskell-98.
GHC and NYhc each have their own proprietary "boot-files" 
with slightly odd semantics to allow this to work (albeit 
the syntax is simple enough)
Hugs doesn't support it at all.

I propose we simplify things and lay down some rules, 
without having to invent explicit module-interface 
signatures.  Then I wouldn't complain(:-)) that GHC doesn't 
have reasonable support for cyclic modules [1][2]. 
(Compiler writers will have to give feedback how plausible 
this is :-) -- I think GHC and NYhc "should" be able to 
adapt their boot-interface-file mechanisms to the scheme I'm 
proposing..

(This is really more of a sketch than a complete proposal at 
this stage.)

In particular, I propose an amount of annotation in a module 
that *shall* make it compile.  Compilers are free to accept 
code for other reasons (e.g. .hs-boot files, or some 
official module interfaces).  These first proposals are 
clean-ups that reflect how ridiculous people think the 
current standard's module interface semantics are compared 
to most languages.  Also they make cross-module type 
inference unnecessary, eliminating the defaulting problem.

namespace level: Haskell98 says that what a module exports 
is determined by the smallest fix-point of what is possible. 
  I can't see a practical use for this behavior, which is 
easily confusing.  I think that exports that depend on the 
result of a fix-point should be rejected.  It can be useful 
in module A to import a few types/functions explicitly from 
a module B that then goes on to export the whole of module A 
though.

type level: Inside any given SCC (loop) of modules, any 
function imported from another member of the SCC normally 
shall have an explicit type signature in the module that 
exports it.  (This doesn't seem a great burden, since 
type-signature for top-level functions/values are considered 
good practice anyway.  Can anyone think of a use-case where 
cross-module type inference would be particularly useful?)

Exception:  imports may be given the {-# SOURCE #-} pragma. 
  This fulfills two purposes:
(1) It is a hint to a compiler that compiles modules 
separately that the current module should be compiled before 
the module being imported with {-# SOURCE #-}.  Obviously, 
this can make optimization worse, since it's likely that 
SOURCE-imported functions won't be strictness-analyzed or 
inlined or anything; but that's the .hs-boot situation 
already.  (And in principle even a compiler that likes 
separate compilation could break individual functions down 
into dependency order to compile them, adding another 
tradeoff point...)
(2) If SOURCE pragmas "break the loop", then only functions 
that are actually imported with SOURCE must be given type 
signatures, even if module B then goes on to import module A 
wholesale: example:
module A where {import {-#SOURCE#-} B (bf); ...}
module B (module A, module B) where {import A; bf :: ...; ...}

Since defining data types in logical places is an important 
use of cyclic imports, I propose not to require any extra 
annotation for them; the compiler will have to chase them 
down and understand them in loops (how else to do it?).
However, there are some particular things to keep in mind 
regarding potential recompilation:
(with a bit of a GHC bias)
Changing any orphan instances in an SCC will force the whole 
thing to recompile (but what pluckiness, putting orphan 
instances *there*!)
If a data type or newtype is imported without its 
constructors, then the RHS changing doesn't really force a 
recompile.
I imagine this could work in GHC by, for each SOURCE import, 
storing the MD5 of the imported interface.  Then when 
checking if you seriously have to recompile module A, you 
don't have to if none of those MD5s have changed and none of 
the non-SOURCE-imported modules' interface MD5s have either. 
  In module cycles that aren't explicitly broken by SOURCEs, 
GHC (or any compiler) should just insert an implicit SOURCE 
for *all* cyclic imports (and possibly emit a warning) 
(unless the compiler wants to guess which SOURCES are better 
for optimization?).  Presumably compilers that can do 
separate as well as non-separate compilation could take an 
optimization flag that tells them to compile cycles together 
as one piece rather than obeying the SOURCES for 
recompilation efficiency.

so what does the compiler have to look at in a 
SOURCE-imported modules?

In the case of the proposed SOURCE imports without hs-boot 
files, GHC would move from calculating one interface(md5) 
per module (or two interfaces in the case of .hs-boots), to 
one-per-import.  I think this is, in principle, an 
advantage, although it does require more re-scanning when 
files are changed (only lexer/parser/renamer/module-chaser 
work).  For example, I've found myself adding to .hs-boot 
files for the purpose of one module that SOURCE-imports the 
.hs-boot, which forces the recompile of another module that 
happens to depend on the .hs-boot too.  To replicate the 
current GHC .hs-boot behavior (in which the 
hash-recalculation is shared among SOURCE-importers), one 
could replace a X.hs-boot file with an X_boot.hs file that 
contains:
         module X_boot (module X) where
         import {-# SOURCE #-} X (list of things exported
                                by the old .hs-boot file)
, and in other modules, replace
         import {-# SOURCE #-} X (....)
with
         import X_boot (....)

Taking .hs-boot docs as a guide [2], the compiler must look 
in SOURCE-imported modules for:

- if an import list is given explicitly, `B (....)` not `B 
hiding (....)` or `B`, the export list only needs to be 
*checked* to make sure it exports the requested things, not 
remembered. Exception: data or class imported with 
`Name(..)` must remember exactly which constructors/members 
were exported.  It's recommended to specify exactly what 
you're importing.
- function type signatures
- imports of functions, types, etc. If it's imported from 
outside the SCC, it doesn't need a type signature/whatever. 
If it's defined somewhere within the SCC, it generally does 
need a type signature.
- fixity declarations, which only have to be imported in 
conjunction with the corresponding 
functions/constructors/whatever
- data type / newtype declarations.  When no constructor is 
imported, only the *kind* of the data type needs to be 
recorded, which might have to involve inference on the RHS 
(possibly involving more import chasing) if there aren't 
explicit kind annotations for *every* type parameter.
- type synonym declarations.  The whole thing has to be 
imported, including RHS.
- classes.  Including superclasses, class-method signatures, 
and default methods?  Is there some way that GHC manages to 
allow not declaring all of these in .hs-boots?
- instances, whether generated by 'deriving', 'deriving 
instance', or ordinary 'instance'; everything before the 
"where" clause of 'instance's is relevant.  But an instance 
is only relevant if it's orphan, or if goes with a data or 
class that's also being imported.
- the compiler-specific RULES pragmas probably follow 
similar mandates as above for instances and for the 
functions referenced in the RULE.

[1] my official "complaint": 
http://hackage.haskell.org/trac/ghc/ticket/1409
[2] the GHC .hs-boot docs: 
http://www.haskell.org/ghc/docs/latest/html/users_guide/separate-compilation.html#mutual-recursion