Revamping the module hierarchy

Fri Jun 19 19:01:43 EDT 2009

Ben Franksen wrote:
> If I want to import a module I have to decide on /one/ module name. Since I
> cannot know at which point in the hierarchy users might have exposed
> modules from other packages, I must chose the default 'root' point. So,
> this will not help library authors who want to e.g. import the 'same'
> module from either mtl or transformers.

No, that misses a big point in the proposal. Yes, every compiled module 
must have only one "name", but that name does not need to be the same 
"name" that is used in Haskell code. This is what it means to separate 
provenance from reference.

To make this more concrete, consider the installed package:

     libfoo.cabal:
         ...
         Build-depends:   base (>= 3.0 && < 4.0) at Base
         exposed-modules: Data.Foo
         ...

     Data/Foo.hs:
         {-# LANGUAGE NoImplicitPrelude #-}
         package Data.Foo where
         import Base.Prelude
         ...

And consider the client package we are compiling:

     libbar.cabal:
         ...
         Build-depends:   base (>= 3.0 && < 4.0) at Elsewhere,
                          libfoo at Foo
         exposed-modules: Control.Bar
         ...

     Control/Bar.hs:
         {-# LANGUAGE NoImplicitPrelude #-}
         package Control.Bar where
         import Elsewhere.Prelude
         import Foo.Data.Foo
         ...

Still with me? Now, when we compiled base-3.5.0 we compiled the 
base-3.5.0:Prelude module. Once we've compiled it we need to give it 
some globally unique name so that we know to refer to exactly some 
byte-offset into some file located on some sector of some disk. What 
this name actually looks like is irrelevant. We could call the compiled 
module "base-3.5.0:Prelude" or we could call it "0xDEADBEEF". If we 
wanted to avoid name-/versionspace clashing up at the package layer, 
then we may prefer something like the latter; but for this discussion 
I'll stick with the former for simplicity.

So before we compile libbar, we have the following compiled modules 
available:

     base-3.5.0:Prelude
     libfoo-0.0.0:Data.Foo

The linking/reference process can be considered like a dialogue between 
the source code and the compiler (or between the compiler and the 
package-manager, if you prefer. The dialogue for compiling 
libbar-42:Control.Bar will look something like this:

     Code: set LANGUAGE NoImplicitPrelude
     GHC:  okay.
     Code: call me libbar-42:Control.Bar
     GHC:  righto, libbar-42:Control.Bar
     Code: I need something called Elsewhere.Prelude
     GHC:  okay, just a sec.

     GHC: hey pkg!
     PKG: j0, wassup dawg
     GHC: I need something called Elsewhere.Prelude
     PKG: I have that at 0xDEADBEEF
     GHC: what?
     PKG: Oh, I mean I have that at base-3.5.0:Prelude
     GHC: thanks.

     /GHC memorizes Elsewhere.Prelude = base-3.5.0:Prelude

     GHC:  hey libbar-42:Control.Bar, you still there?
     Code: yeah
     GHC:  I found Elsewhere.Prelude
     Code: thanks
     ...
     Code: I need to get the type of Elsewhere.Prelude.curry
     GHC:  okay, just a sec.

     GHC: hey pkg, what's the type of base-3.5.0:Prelude.curry ?
     PKG: base-3.5.0:Prelude.curry :: ((a, b) -> c) -> a -> b -> c

     GHC: hey libbar-42:Control.Bar,
          Elsewhere.Prelude.curry :: ((a, b) -> c) -> a -> b -> c

     ...

     /Code leaves #haskell
     /GHC forgets module mappings
     /GHC waits for Code to join #haskell

Naturally GHC needs to be in on the joke and needs to be aware of both 
"names" for the same compiled module. But this is no different than what 
we already have. The module names used in Haskell code do not refer to 
the version of the module they need, and again they shouldn't have to. 
When the code asks for Prelude, it's up to GHC and PKG to determine 
which version of the Prelude should be linked to the code.

The only thing that changes in this proposal is that PKG can have a more 
sophisticated way of mapping Haskell module names into compiled module 
object files.

> IMO it makes much more sense to let client packages decide from where in the
> module hierarchy they want to import modules from another package, rather
> than forcing users to decide this globally per installation.

Right. For each package (or compilation unit), the user/client 
constructs a map from the compiled module object files to the Haskell 
module names. The namespace that each package sees is only a 
fabrication, because the Haskell module names are rewritten into 
compiled module object names in the Core code GHC produces. So every 
package can make up their own independent mapping.

> Thus, grafting should not be done when exposing packages, but rather when
> actually using them. Your examples above would become
> 
>   ghc -package libfoo-0.0.0 at Zot ...
> 
> resp.
> 
>   ghc -package libfoo-0.0.0:Control.Bar at Quux
>
> This would also better play with the way cabal does things: cabal currently
> ignores hidden/expoosed status of packages; instead it hides everything and
> then explicitly 'imports' exact versions using the -package option. With a
> few tweaks to the cabal file syntax, we could easily declare package 'mount
> points' (even for subtrees) when declaring the dependent packages and this
> would be tranformed to the ghc command line syntax above.

Six of one... :)

As far as the proposal goes, the only important bit is that the names 
that Code uses are different than the names GHC/PKG use. Whether the 
namespace mapping is done by ghc-pkg, ghc, ghci, or whatever doesn't 
really matter since they're all on the same side of the fence. At that 
point it's just delegation of responsibility.

The reason I was singling out ghc-pkg as the PKG is because (so far as I 
know) that's its current purpose. When Cabal runs, it needs to sanitize 
the namespace mapping. It does this by first hiding all packages, and 
then exposing only the ones the *.cabal file indicates are necessary 
(apparently via flags to ghc rather than calls to ghc-pkg). Right now, 
all packages are exposed at the same root in the module namespace; the 
extension is just to say that packages (and subtrees of packages) can be 
exposed wherever we want. After Cabal is done, it restores whatever 
mapping was in place before it started sanitizing things.

 From what (little) I know of how Cabal works under the covers, it makes 
sense to me that the "exposure" step is the right place to do grafting. 
If the map from Haskell module names to the compiled modules in exposed 
packages is already separate from the exposure process, then of course 
grafting should be done wherever that mapping is kept.

If the real purpose of ghc-pkg is to give a system-default module 
namespace for ghc/ghci when commandline flags are not set, then sure 
ghc/ghci will need new flags. Of course ghc-pkg will also need new flags 
since it too is constructing a module namespace.

-- 
Live well,
~wren