Packages

Simon Marlow simonmar@microsoft.com
Tue Sep 9 14:08:27 EDT 2003


Following discussion on and off this list, we've re-written the
proposal for changes to the packages/libraries story.

I've included the proposal as plain text below so that it can be quoted
easily, but if you would prefer to read it in HTML there's a version
here:

   http://www.haskell.org/~simonmar/packages.html

The main differences relative to the previous proposal are:

   - Packages, with unique package names, are given more emphasis.

   - Grafting, in particular grafting a library in multiple places, is
     given less emphasis in this new proposal.  Multiple grafting
     isn't essential: the two motivating examples we had previously
     (versioning and identifying APIs by GUID) are both served by
     having unique package names.

Motivation
----------

This proposal describes an implementation-independent mechanism,
called "packages", that allows a library or other group of Haskell
modules to be wrapped up as a single unit.  Here is why we need this
mechanism:

- We want to lower the barrier to shipping a new Haskell library.
  At present a library author must, for each module M in his library,
  find a place for M in the single global module hierarchy.  Either
  she make the library inconvenient to use (by using deeply-nested
  module names) or else risks clashing with "popular" sitse in the tree.
 =20
- An author is likely to produce multiple versions of a library.
  If these live in different parts of the global module name space,
  one has to change every importing module to switch to the new version.
  If they re-use the same names as the previous version, it's hard to
  know which version is required, and impossible to build a program that
  simultaneously needs two different versions.  For example, perhaps=20
  your program uses version N of an API, but you import a library=20
  which depends on version N-1 of the same API).

- We want to have some support for abbreviating module names in source
  code (to avoid very long module names), and being able to move=20
  a sub-hierarchy of source modules around in the global hierarchy,
  without modifying the source code directly.

- We want to be able to uniquely identify a library API, for the
purposes=20
  of expressing source code dependencies, and for the purposes of being
  able to automatically install dependencies.  This would make it
possible
  to automate the business of installing the necessary support packages
  for a given package.


Packages
--------
In this proposal, a "package" is the unit of distribution.  A package
defines a sub-tree of modules; eg.  GTK, GTK.Window, GTK.Button, ...
However, crucially, the package does not define absolute module names,
but instead can be grafted into the module hierarchy at different=20
sites, without recompilation (see "grafting" below).

Every package has a "package identifier".  A package identifier is a
string, eg. "gtkhs".  It is the intention that package identifiers are
globally unique, but we don't intend to enforce this in any rigorous
way.  There will probably be a web page which maintains a list of
package identifiers, and where one can register a new one.  Things
will go badly wrong if you try to use two packages with the same
identifier.

A "package name" is defined as a pair of a package identifier and a
version number.  For example, "gtkhs-0.4".  A package name uniquely
identifies an API: that is a set of modules, and the interfaces to
those modules.  The package web interface might well link to the
documentation for each package API, as well as the place where the
package can be obtained.  Note that because we have a way to uniquely
identify an API, GUIDs are not required.

A package takes two forms.  A "source package" consists of

  Meta-data that describes the package
    - The package identifier
    - Package major and minor version
    - A default grafting location for this package
    - Dependencies, expressed as a set of triples
         (package identifier, version range, grafting location)
    - Etc (e.g. documentation, installation materials)
  Payload
    - Haskell modules (source, object, interface, analysis results)
    - Associated C header files or other support code

A "binary package" is the same, except that

  - The Haskell modules and other source materials are in compiled,
    object code, form.
  - Information about which compiler was used, and which version
    of that compiler
  - The dependencies are expressed as a set of package names only.

The existence of packages offers new opportunities for encapsulation.=20
For example, the meta-data for a package could expose some, but not
all, of the modules in the package, giving the package author the
chance to securely hide internal modules.

Grafting
--------
The modules in a package form a sub-hierarchy.  This sub-hierarchy can
be mapped into the global module hierarchy at any point when the
package is used; this operation is called "grafting".  For example, if
we have modules

     Gtk
     Gtk.Window
     Gtk.Button

in the package "gtkhs-0.4", and this package is grafted onto
"Graphics.UI", then these modules would be available to a user of the
"gtkhs-0.4" package as=20

    Graphics.UI.Gtk
    Graphics.UI.Gtk.Windwow
    Graphics.UI.Gtk.Button

Note that this provides a simple way to abbreviate module names in
source code, as well as providing a way to easily move an entire
sub-hierarchy of modules around in the global hieararchy without
changing every source file.


Installing a package
--------------------
Installing a package is the action a client takes to make a new=20
package known to a particular Haskell implementation.

A package comes with a default grafting location.  Installing
the package makes it available at that grafting location, without
the need for any command-line flags.  GHC calls such packages "auto
packages", and we will follow that terminology here.=20

At most one version of any given package can be an auto package, and
(by convention) it is always the latest installed version. That is,
when installing a package, that package only becomes an auto package
(available without flags) if its version is later than any other
installed version of that package.

It should also be possible to install a package at a site different
from its default grafting location.  Existing package managers such as
RPM don't have a way to specify a grafting locations anyhow, but the
Haskell library infrastructure (currently in development) would no
doubt have a way to change the grafting location if used directly.


Specifying Grafting Locations at compile time
---------------------------------------------
Each Haskell implementation should provide a means for specifying
packages and grafting locations when compiling Haskell source code.
One possibility for GHC is to extend the command-line syntax for
-package, eg.:

   ghc -package gtkhs-0.5:Graphics.UI

In that case, the command-line choice for a particular package
should override (replace) the install-time choice for that package. =20

For example, if gtkhs-1.7 is installed so that it is available by
default, then the command above would *remove* gtkhs-1.7 from the module
name space, and instead graft in gtkhs-0.5.  Why? Because both=20
specify the same package name "gtkhs".  In short, any one compilation
should see at most one version of each package.

Overlapping Packages
--------------------
If two packages are grafted in such a way that they both define
the same absolute module in the module hierarchy, then it is an error
to import that module.  (This is akin to the error that is reported
if two import statements in a Haskell program bind the same name.)

For example, if "gtkhs-0.4" defines a module "GTK.Misc",=20
and "graph-1.8" defines the module "Misc", and one says
    ghc -package gtkhs-0.4:Graphics.UI=20
        -package graph-1.8:Graphics.UI.GTK
then the import declaration
     import Graphics.UI.GTK.Misc
would be an error, because it is defined by both packages.


Shipping a new library
----------------------

Joe H. Programmer just wrote a small library and wants to share it
with the world.  What does he have to do?  Under our proposed scheme,
it would go something like this:

  - Make up a package name, and register it using the web interface at
    haskell.org, to avoid anyone else using the same name.

  - Decide what the default grafting location for the library should
    be.  There will be some hierarchy layout guidelines on haskell.org
    for library writers to follow - these won't be set in stone,
    though.  The worst that can happen is that your package will
    overlap with another common one, and will end up getting turned
    off by default when installed.

  - Package up your library using the Haskell library infrastructure,
    and share it.


Implementation
--------------

Here's what we have to do for GHC:

 1. An entity in a Haskell program was previously uniquely identified
    by its (module name, identifier) pair, where the module name is
    the module in which the entity is defined.  This now becomes a
    triple: (package name, relative module name, identifier).

 2. Extend the package spec syntax to include grafting locations, and
    lists of overlapping packages.

 3. Extend the -package flag syntax to allow specifying a new grafting
    location.

 4. Change the searching semantics to take into account grafting
    locations.

 5. Implement the "version overriding" semantics, and error checking
    to do with visibility of overlapping packages.

(1) is quite a fundamental changes, but (2-5) are all quite
straightforward.

I think a similar strategy would work for Hugs & NHC, although Hugs at
least will need to also acquire support for packages.



More information about the Libraries mailing list