Proposal: module namespaces.

Mon, 26 Feb 2001 17:59:30 +0000

This is an annoucement of a new mailing list, and a proposal for
three things:

  *  An extended mechanism for module namespaces in Haskell.
  *  A "standard" namespace for new libraries, common across all systems.
  *  A social process for adding new libraries to the "standard" set.

A formatted version of this proposal appears on the web at
        http://www.cs.york.ac.uk/fp/libraries/

The new mailing list is for the discussion of these proposals.
Please subscribe if you are interested.  Follow-ups set accordingly.

Mailing list details
--------------------
                      libraries@haskell.org

The purpose for this new list is to: 
  (a) discuss an extension to Haskell to provide a richer module namespace,
  (b) discuss how to partition this namespace and populate it with libraries,
  (c) discuss how to provide a consistent set of libraries for all compilers,
      and the setting up of a common library repository.

To subscribe:  http://haskell.org/mailman/listinfo/libraries/

Introduction
------------
Everyone agrees that Haskell needs good, useful, libraries: lots of
them, well-specified, well-implemented, well-documented.  A problem
is that the current "Standard Libraries" defined by the Haskell'98
Report number only about a dozen.  But there are actually many more
libraries out there: some are in GHC's hslibs collection, others are
linked from haskell.org, even more are used only by their original
author and have no public distribution.

What is more, there is no Haskell Committee.  There is no-one
to decide which candidate libraries are worthy to be added to the
"Standard" set.  This stifles the possible distribution of great
libraries, because no-one knows how to get /my/ library "accepted".

Furthermore, the existing libraries that people distribute from
their own websites often run into problems when used alongside other
people's libraries.  A library usually consists of several modules,
but often the constituent modules have simple names that can easily
clash with modules from another library package.  This leads people
to ad hoc solutions such as prefixing all their modules with a
cryptic identifier e.g.

        HsParse
        XmlParse
        HOGLParse
        THIHParse

Just counting the libraries currently available from GHC's hslibs, and
haskell.org's links, there are currently over 200 separate modules in
semi-"standard" use.  As more libraries are written, the possibility
of clashes can only increase.

Related to this problem, although not identical, is the difficulty
of finding a library that provides exactly the functionality you need
to help you write a specific application program.  How do you go
about searching through 200+ modules for interesting-looking datatypes
and signatures, starting only from the module names?

My View
-------
My view is that many of these problems are rooted in Haskell's
restriction to a flat module namespace.  If we can address that issue
adequately, then I believe that many of the difficulties surrounding
the provision of good libraries for Haskell will simply fall away.

Proposal 1
----------
Introduce nested namespaces for modules.  The key concept here is to
map the module namespace into a hierarchical directory-like structure.
I propose using the dot as a separator, analogous to Java's usage
for namespaces.

So for instance, the four example module names above using cryptic
prefixes could perhaps be more clearly named

    Haskell.Language.Parse
    Text.Xml.Parse
    Graphics.Drawing.HOpenGL.ConfigFile.Parse
    TypeSystem.Parse

Naming proceeds from the most general category on the left, through
more specific subdivisions towards the right.

For most compilers and interpreters, this extended module namespace
maps directly to a directory/file structure in which the modules
are stored.  Storing unrelated modules in separate directories (and
related modules in the same directory) is a useful and common practice
when engineering large systems.

(But note that, just as Haskell'98 does not *insist* that modules live
in files of the same name, this proposal does not insist on it either.
However, we expect most tools to use the close correspondance to
their advantage.)

There are several issues arising from the particular proposal here.

  * This is a surface change to the module naming convention.  It
    does not introduce nested /definition/ of modules.

  * The syntax I propose (a dot separator) is familiar from other
    languages such as Java, but could in principle be something else,
    for instance a prime ' or underscore _ or centred dot · or
    something different again.

  * Of the choices of separator, dot requires a change to the Haskell'98
    lexical syntax, allowing
            modid -> qconid
    where currently the syntax is
            modid ->  conid

  * The use of qualified imports becomes more verbose: for instance
            import qualified XmlParse
                      ... XmlParse.element f ...
    becomes
            import qualified Text.Xml.Parse
                      ... Text.Xml.Parse.element f ...
    However, I propose that every import have an implicit "as"
    clause to use as an abbreviation, so in
            import qualified Text.Xml.Parse   [ as Parse ]
    the clause "as Parse" would be implicit, unless overridden by the 
    programmer with her own "as" clause.  The implicit "as" clause
    always uses the final subdivision of the module name.  So for
    instance, either the fully-qualified or abbreviated-qualified names
            Text.Xml.Parse.element
            Parse.element
    would be accepted and have the same referent, but a partial
    qualification like
            Xml.Parse.element
    would not be accepted.

  * Another consequence of using the dot as the module namespace
    separator is that it steals one extremely rare construction from
    Haskell'98:
            A.B.C.D
    in Haskell'98 means the composition of constructor D from module C,
    with constructor B from module A:
            (.)  A.B  C.D
    No-one so far thinks this is any great loss, and if you really
    want to say the latter, you still can by simply inserting spaces:
            A.B . C.D

Further down this document, I give more motivation and a rationale for
this proposal of nested namespaces.  But first, two other proposals
which rest on the first one.

Proposal 2
----------
Adopt a standardised namespace layout to help those looking for or
writing libraries, and a "Std" namespace prefix for genuinely
standard libraries.  (These are two different things.)

The hslibs collection of modules is a great starting place for
finding common libraries that could become standards.  I propose
that we adopt a "standardised" namespace hierarchy, based on the
current hslibs layout, into which Haskell programmers can plug their
own libraries relatively easily (whether they intend to release them or
not).  The aim is to make it clear where to place a new module, and
where to search for a possible existing module.

For instance, in ASCII art, here is a small part of a suggested tree.

    + Data + Structures + Trees + AVL
    |      |            |       + RedBlack
    |      |            |
    |      |            + Queue + Bankers
    |      |                    + FIFO
    |      + Encoding + Binary
    |                 + MD5
    |
    + Graphics + UI + Gtk + Widget
    |          |    |     + Pane
    |          |    |     + Text
    |          |    | 
    |          |    + FranTk
    |          |
    |          + Drawing + HOpenGL + ....
    |          |         + Vector
    |          |
    |          + Format + Jpeg
    |                   + PPM
    + Haskell + ....
    |

A fuller proposed layout appears on the web at
    http://www.cs.york.ac.uk/fp/libraries/layout.html

In addition to a standardised hierarchy layout, I propose a truly
Standard-with-a-capital-S namespace.  A separate discussion is needed
on what exactly would consitute "Standard" quality, but by analogy
with Java where everything beginning "java." is sanctioned by Sun,
I propose that every module name beginning "Std." is in some sense
sanctioned by the whole Haskell community.

So for instance, an experimental, or not-quite-complete, library
could be called

    Text.Xml

but only a guaranteed-to-be-stable, complete, library could be called

    Std.Text.Xml

The implication of the Std. namespace is that all such "standard"
libraries will be distributed with all Haskell systems.  In other
words, you can rely on a standard library always being there, and
always having the same interface on all systems.

Proposal 3
----------
Develop a process by which candidate libraries can be proposed to
enter the Std namespace.

Since Haskell'98 is fixed, and there is no longer a Haskell Committee,
there is no official body capable of deciding new standards for
libraries.  However, we do have a Haskell community which will use
or not use libraries, depending on their quality.  So libraries will
become standards by a de-facto process, rather than de-jure.

Apart from the Haskell compiler implementers, we wanted a means to
encourage the whole community to be involved in recognising de facto
"standard" libraries.  The mailing list 'libraries@haskell.org'
is one contribution.  We hope this will work on the same model as
the FFI mailing list, which has been pretty successful at allowing a
community of designers and implementers to explore their FFI needs and
solidify a design that is common across at least three Haskell systems.

On top of this discussion however, some final decisions will have to
be made on which libraries achieve entry to the "Std." namespace.  The
Haskell implementers have collectively proposed a ruling troika, one
representing each of the three main Haskell systems (Hugs,ghc,nhc98).
These are Simon Marlow, representing ghc, and current keeper of the
hslibs collection;  Malcolm Wallace, representing nhc98; and Andy Gill,
representing Hugs users.

Some obvious criteria for entry to the "Std." namespace would be:

   * The interface is stable and unlikely to change significantly;
   * The library is written in pure Haskell'98.  This criterion
     is likely to be the most contentious, so perhaps a better
     idea would be that ...
   * ... an implementation exists for at least the three Haskell
     systems Hugs, ghc, and nhc98;
   * The library is already in current use, so bugs in its coding and
     design have been ironed out;
   * The Haskell community recognises it as solving a common task,
     or encapsulating a common programming idiom.

These suggested criteria need some discussion and improvement.

After the initial period of deciding what belongs in the "Std."
namespace, I would expect any further candidate libraries that
are proposed for standardisation to spend some time in another
part of the namespace hierarchy whilst they gain stability and
common acceptance, before being moved to "Std.".

Rationale and Motivation for Proposal 1 (nested namespaces)
-----------------------------------------------------------

Scenario 1
----------
Imagine you have just written a new library of, say, Pretty-printing
combinators.  You want to release it to the Haskell public.  So what
do you call it?

    module Pretty	-- already taken (several times)
    module UU_Pretty	-- also taken
    module PrettyLib	-- already exists as well

Ok, so lacking any further inspiration, you end up deciding to call it

    module MyPretty	-- !

Surely there must be a better solution.  Of course there is - namespaces.
Let's classify libraries that do similar jobs together:

    module Text.PrettyPrinter.Hughes	-- the original Hughes design
    module Text.PrettyPrinter.HughesPJ	-- later modified by Simon PJ
    module Text.PrettyPrinter.UU	-- the Utrecht design
    module Text.PrettyPrinter.Chitil	-- Olaf's new design

These are exactly the same Pretty libs as before, but named more
sensibly.  It is still clear that each is a pretty-printing library,
but it is also clear that they are different.

Incidentally, have you ever tried to write your own module called
Pretty?  You may have discovered with GHC (which has a Pretty already
in the hslibs collection), that you get strange errors.  This is
because sometimes the compiler can be confused into reading one
Pretty.hi interface file (i.e. yours), yet linking the other Pretty.o
object file (i.e. from hslibs), ending in a core dump.  With proper
module namespaces, this confusion should never happen again.

Scenario 2
----------
You are writing a complex library that has a couple of layers
of abstraction.  For some users, you want to expose just a small
high-level set of types and functions.  Other users will need
more detailed access to lower-level stuff.

With namespaces, you can use the directory-like structure to make these
kinds of access explicit.  For instance, imagine a socket library:

    module Network.Socket

It exports an /abstract/ type Socket for ordinary users - they only
need to know its name.  More advanced hackers however can play with
the details of the type, because you also have:

    module Network.Socket.Types

which exports the Socket type non-abstractly i.e. Socket(..).  And of
course this abstraction is easy for the library-writer to manage,
because the implementation of the more abstract layer simply imports
and re-exports a careful selection of the more detailed layers.

Don't forget that, in terms of the actual filesystem layout, it is
perfectly OK to have e.g.

    file  Network/Socket.hs
    dir   NetWork/Socket
    file  Network/Socket/Types.hs

Scenario 3
----------
You are managing a software engineering project.  Several people
are working more-or-less independently on different sections of the
program.  To avoid mistakes with files, you give each one a separate
directory to place their code in.  But in Haskell'98 this is not
enough to ensure that they invent module names that do not clash with
other developers' modules.  So you insist that everyone also uses a
prefix-naming scheme for each appropriate sub-task.

For instance, here is a sketch of the layout of the Galois Connection
team's entry in the ICFP 2000 programming contest:

    dir  CSG			-- constructive solid geometry
    file CSG/CSG.hs
    file CSG/CSGConstruct.hs
    file CSG/CSGGeometry.hs
    file CSG/CSGInterval.hs
    dir  Fran			-- Fran-style animation
    file Fran/FranLite.hs
    file Fran/FranCSG.hs
    dir  GML			-- interpreter for little language
    file GML/GMLData.hs
    file GML/GMLParse.hs
    file GML/GMLPrimitives.hs

So now the problem is that to actually build the software, you need
to write a Makefile that descends into these directories.  Or maybe
you use 'hmake' like so:

    hmake examples/chess.hs -ICSG -IFran -IGML -IRayTrace -package text

Note how many sub-directories you must remember to add to the
command line (this applies equally for compiler options in Makefiles).
Note also the inconsistency between compiling and linking /my/ modules,
against using and linking a "standard" hslibs module from package text.

Isn't there a simpler way?  Yes.  Namespaces.  Prefix naming is no
longer needed inside directories, because the directory name is /part/
of the module name:

    file CSG.hs			-- re-exports everything from the CSG dir
    dir  CSG
    file CSG/Construct.hs
    file CSG/Geometry.hs
    file CSG/Interval.hs
    dir  Fran
    file Fran/Lite.hs
    file Fran/CSG.hs		-- does not conflict with top-level CSG.hs
    dir  GML
    file GML/Data.hs
    file GML/Parse.hs
    file GML/Primitives.hs

And now, the commandline to 'hmake' (or compiler options in a Makefile)
becomes simply:

    hmake examples/chess.hs -I.

You only need to specify the root of the module tree (-I.), and all
modules in all subdirectories can be found via their full namespace
path as used in the source files.  Note also that, whereas previously
we needed to specify a package for whatever hslibs modules were
used, now the compiler/hmake already knows the root of the installed
hslibs tree and can use the same mechanism to find and link "standard"
modules as for user modules.

From this example it should be clear that the use of module namespaces
is of benefit to ordinary programs that may never become public,
quite aside from any benefits we expect to derive in managing
publically-distributed library code.

What now?
---------
Ok, so that's my proposal.  The implementers of some of the main
Haskell systems have seen a presentation of these ideas, and seemed to
like them.  Namespaces are already implemented in nhc98 (v1.02) and
hmake (v2.02) if you want to play with them.  I expect some discussion
to refine this proposal on the 'libraries@haskell.org' list, to
which everyone interested is invited.

Once we have nailed down the precise design, we need to get matching
implementations in all systems.  I have rashly volunteered to implement
the lexical/parsing/module-search changes in any Haskell system that
no-one else volunteers for (probably ghc, Hugs, possibly hbc).

But after that we will still have many more decisions to take about
individual libraries, precise naming, build systems, and so on, not
to mention actually writing the libraries.  Get involved.  Contribute.

Regards,
    Malcolm