rfc: package mounting

Thu Jun 23 05:14:00 EDT 2005

Hi all,

It looks like there's been a bit of recent discussion regarding module
and package namespaces. There is a certain possible design feature
that I don't think has been mentioned yet, that I think would be very
helpful, so I thought I should at least bring it up.

What I want is to be able to build a module namespace for a program
out of packages in much the same way that filesystem namespaces are
built, namely with mounting operations, rather than just by "union" or
"overlay" operations as in the status quo. In other words I would like
to be able to specify along with the "-package" option a "mount point"
for that package in the module namespace. One possible option syntax
might be e.g. "-package my-graphics-lib -package-base
Graphics.UI.MyGraphicsLib". (Also, for backward compatibility and
convenience, packages should probably be able to specify a default
"mount point", to allow existing compiler command-line syntax to be
used.)

The idea is that with such a feature, library packages could get rid
of the common module path prefixes which currently must be specified
in every module in the library (such as "Graphics.UI.MyGraphicsLib"
above). These prefixes would instead be specified once by each user of
the library package (unless the default was desired), perhaps after
the package import option on the compiler command line. Modules would
have simple unqualified names within the library, like "Button" or
"Window" which, if the package mount point were specified as say
"Graphics.UI.MyGraphicsLib" in a certain compiler invocation, would be
mapped to "Graphics.UI.MyGraphicsLib.Button" and
"Graphics.UI.MyGraphicsLib.Window" respectively for code compiled by
that invocation. But they could just as easily be mapped to
"MGL.Button" etc. in a different invocation in a different project if
a different mount point were preferred or were necessary to eliminate
a namespace collision.

There would be many benefits to being able to do things this way.

First, developers would be able to move shared code across libraries
without having to worry about the need to make widespread trivial
changes to reflect the new module names. I could copy a 'Debug' or
'Util' module into my library from another library, and not have to go
through the code to update the module hierarchy base location -
furthermore I could incorporate new upstream changes easily without
having to repeat this menial fixing-up procedure each time. While it's
true that new version control systems like 'darcs' are meant to handle
search-and-replace style changes effectively, I think that as far as
this issue goes, a VC-based solution would be less elegant and less
usable than what I am proposing.

Second, this would decouple some aspects of the design process that in
my opinion shouldn't be coupled. I would be able to start writing a
library before deciding on a name, for instance - currently I at least
have to stick in a dummy name as the module namespace base to avoid
potential conflicts with other library imports while testing. But
under this proposal I could just concentrate on building interior,
bottom-up functionality first - at the end of the process a certain
set of the package modules would be marked for external visibility,
would comprise the exterior interface, and would suggest to me a
fitting package name. Setting this name would only involve touching
the cabal file rather than every single source file in my library.
This would also make it easier to merge and split packages.

Third, it would encourage the use of lightweight modules, by reducing
the maintenance overhead of each module. Currently modules are the
only way (correct me) to partition parts of the top-level namespace of
a program - this is OK except that especially in libraries each module
contains a certain amount of administrative paperwork, which is to say
that it has to know the name of the library that contains it, because
that, or some form of it, has to be part of the module name; and other
importers of the module have to specify this information too; and as
argued above there is a little work involved in touching up these
references when code moves between libraries or when the library name
changes. As a result I think people end up sticking more code in the
same module at times when multiple modules would have been otherwise
more suitable.

Fourth, I think there would be psychological benefits. I think it's a
bit patronizing to the programmer that he has to pretend to remind
himself "you are in the following package" at the top of each file. I
think people can easily enough keep track of that amount of state.
It's as if the building code required me to put a sign with the
current city and country in each room of my house. These are bits of
context that I can easily call to mind if necessary, but which I would
sometimes like to temporarily forget about. I believe programming is
somewhat the same. We've come a long way from languages like C where
one has to decide whether to precede each symbol in a library with an
otherwise-meaningless identifier like "Py_", or risk namespace
collisions with other libraries - but I don't think we're at the end
of the road yet. It's true that if a module occurs in a package then
that is its package, but often the package name doesn't do anything to
suggest what the module functionality is - maybe the module exists
only because other package modules depend on it, or maybe the package
provides an assortment of otherwise unrelated functionality in its
modules. In other words, the package name may describe something that
the module functionality is only *applicable* to, or it may just be a
catchy name. Being able to leave out this not-quite-relevant piece of
information would make package-distributed code more conceptually
streamlined, easier to quote out of context in e.g. papers, etc.
Haskell is really a beautiful language - it's very dense and I think
it is one of its great advantages that it allows the programmer to
eliminate from code almost all save that which is absolutely relevant
to its functionality - this proposal would take the language further
in that direction.

There is a strong tradition in science to put things in taxonomies, in
static hierarchies, and people have tried to do this with collections
of code libraries too, perhaps in imitation of scientists. One thing
to note however is that the things from the natural world such as the
genes of biological organisms change a lot more slowly than man-made
code does. Science is different from engineering. A related reason
that language designers may be drawn to requiring users to participate
in universal classifications is that doing so projects an artificial
aura of stability and organization onto the evolving code situation.
But these designers, in shading themselves from progress, also stifle
it. They create a central administrative hoop which couples all
packages and impacts the scalability of the collective development
effort.

What I'm proposing would be a big departure from the practice of
languages like Perl and Java that demand such a global module
hierarchy. I've been told that the Haskell community is trying to make
it so that two packages can have modules of the same name, as long as
they aren't imported in the same compilation unit. My proposal would
go further by (1) removing the latter restriction (2) allowing the
package code to be completely ignorant of any "mount point".

By the way, if you look at some aspects of operating system interfaces
I think you'll see that often the choice I'm suggesting has already
been made. For instance, you don't have to specify your current
working directory with every command you execute, and furthermore the
same command can easily be used in different working directories
without modification; you can install binaries at different locations
in the filesystem; you can mount filesystems at different
mount-points, etc.

One further thing, there have been proposals to simplify the importing
of collections of modules from a certain point in the namespace, etc.
I hope it is realized that they are independent from my proposal. They
would not be very useful in implementing my proposal, at least I think
any such solution would be far from optimal; and vice-versa. Modules
and packages are quite distinct constructs, modules are needed for
namespace partitioning and packages are needed to delineate
administrative boundaries and sources of change. Both are necessary
and both deserve special consideration in the ongoing design of
Haskell.

I will not be surprised if it seems strange to people that I attach
such importance to what is likely seen as an unimportant detail of the
language, but I do, and I hope that people will consider my
suggestion.

Also, I haven't said anything about implementation. I realize that
this would probably require some modification to the linker. I hope
I'm correct in assuming that the modifications will be relatively easy
to make, provided it turns out of course that this feature is really
something that people want.

Frederik

P.S. Thanks to John Meacham for a useful discussion.

-- 
http://ofb.net/~frederik/