Revamping the module hierarchy
wren ng thornton
wren at community.haskell.org
Fri Jun 19 19:01:43 EDT 2009
Ben Franksen wrote:
> If I want to import a module I have to decide on /one/ module name. Since I
> cannot know at which point in the hierarchy users might have exposed
> modules from other packages, I must chose the default 'root' point. So,
> this will not help library authors who want to e.g. import the 'same'
> module from either mtl or transformers.
No, that misses a big point in the proposal. Yes, every compiled module
must have only one "name", but that name does not need to be the same
"name" that is used in Haskell code. This is what it means to separate
provenance from reference.
To make this more concrete, consider the installed package:
libfoo.cabal:
...
Build-depends: base (>= 3.0 && < 4.0) at Base
exposed-modules: Data.Foo
...
Data/Foo.hs:
{-# LANGUAGE NoImplicitPrelude #-}
package Data.Foo where
import Base.Prelude
...
And consider the client package we are compiling:
libbar.cabal:
...
Build-depends: base (>= 3.0 && < 4.0) at Elsewhere,
libfoo at Foo
exposed-modules: Control.Bar
...
Control/Bar.hs:
{-# LANGUAGE NoImplicitPrelude #-}
package Control.Bar where
import Elsewhere.Prelude
import Foo.Data.Foo
...
Still with me? Now, when we compiled base-3.5.0 we compiled the
base-3.5.0:Prelude module. Once we've compiled it we need to give it
some globally unique name so that we know to refer to exactly some
byte-offset into some file located on some sector of some disk. What
this name actually looks like is irrelevant. We could call the compiled
module "base-3.5.0:Prelude" or we could call it "0xDEADBEEF". If we
wanted to avoid name-/versionspace clashing up at the package layer,
then we may prefer something like the latter; but for this discussion
I'll stick with the former for simplicity.
So before we compile libbar, we have the following compiled modules
available:
base-3.5.0:Prelude
libfoo-0.0.0:Data.Foo
The linking/reference process can be considered like a dialogue between
the source code and the compiler (or between the compiler and the
package-manager, if you prefer. The dialogue for compiling
libbar-42:Control.Bar will look something like this:
Code: set LANGUAGE NoImplicitPrelude
GHC: okay.
Code: call me libbar-42:Control.Bar
GHC: righto, libbar-42:Control.Bar
Code: I need something called Elsewhere.Prelude
GHC: okay, just a sec.
GHC: hey pkg!
PKG: j0, wassup dawg
GHC: I need something called Elsewhere.Prelude
PKG: I have that at 0xDEADBEEF
GHC: what?
PKG: Oh, I mean I have that at base-3.5.0:Prelude
GHC: thanks.
/GHC memorizes Elsewhere.Prelude = base-3.5.0:Prelude
GHC: hey libbar-42:Control.Bar, you still there?
Code: yeah
GHC: I found Elsewhere.Prelude
Code: thanks
...
Code: I need to get the type of Elsewhere.Prelude.curry
GHC: okay, just a sec.
GHC: hey pkg, what's the type of base-3.5.0:Prelude.curry ?
PKG: base-3.5.0:Prelude.curry :: ((a, b) -> c) -> a -> b -> c
GHC: hey libbar-42:Control.Bar,
Elsewhere.Prelude.curry :: ((a, b) -> c) -> a -> b -> c
...
/Code leaves #haskell
/GHC forgets module mappings
/GHC waits for Code to join #haskell
Naturally GHC needs to be in on the joke and needs to be aware of both
"names" for the same compiled module. But this is no different than what
we already have. The module names used in Haskell code do not refer to
the version of the module they need, and again they shouldn't have to.
When the code asks for Prelude, it's up to GHC and PKG to determine
which version of the Prelude should be linked to the code.
The only thing that changes in this proposal is that PKG can have a more
sophisticated way of mapping Haskell module names into compiled module
object files.
> IMO it makes much more sense to let client packages decide from where in the
> module hierarchy they want to import modules from another package, rather
> than forcing users to decide this globally per installation.
Right. For each package (or compilation unit), the user/client
constructs a map from the compiled module object files to the Haskell
module names. The namespace that each package sees is only a
fabrication, because the Haskell module names are rewritten into
compiled module object names in the Core code GHC produces. So every
package can make up their own independent mapping.
> Thus, grafting should not be done when exposing packages, but rather when
> actually using them. Your examples above would become
>
> ghc -package libfoo-0.0.0 at Zot ...
>
> resp.
>
> ghc -package libfoo-0.0.0:Control.Bar at Quux
>
> This would also better play with the way cabal does things: cabal currently
> ignores hidden/expoosed status of packages; instead it hides everything and
> then explicitly 'imports' exact versions using the -package option. With a
> few tweaks to the cabal file syntax, we could easily declare package 'mount
> points' (even for subtrees) when declaring the dependent packages and this
> would be tranformed to the ghc command line syntax above.
Six of one... :)
As far as the proposal goes, the only important bit is that the names
that Code uses are different than the names GHC/PKG use. Whether the
namespace mapping is done by ghc-pkg, ghc, ghci, or whatever doesn't
really matter since they're all on the same side of the fence. At that
point it's just delegation of responsibility.
The reason I was singling out ghc-pkg as the PKG is because (so far as I
know) that's its current purpose. When Cabal runs, it needs to sanitize
the namespace mapping. It does this by first hiding all packages, and
then exposing only the ones the *.cabal file indicates are necessary
(apparently via flags to ghc rather than calls to ghc-pkg). Right now,
all packages are exposed at the same root in the module namespace; the
extension is just to say that packages (and subtrees of packages) can be
exposed wherever we want. After Cabal is done, it restores whatever
mapping was in place before it started sanitizing things.
From what (little) I know of how Cabal works under the covers, it makes
sense to me that the "exposure" step is the right place to do grafting.
If the map from Haskell module names to the compiled modules in exposed
packages is already separate from the exposure process, then of course
grafting should be done wherever that mapping is kept.
If the real purpose of ghc-pkg is to give a system-default module
namespace for ghc/ghci when commandline flags are not set, then sure
ghc/ghci will need new flags. Of course ghc-pkg will also need new flags
since it too is constructing a module namespace.
--
Live well,
~wren
More information about the Libraries
mailing list