[Haskell-cafe] What's in a name?

Fri Aug 15 19:24:39 EDT 2008

Sean Leather wrote:
> That doesn't work if you want to use two packages that have modules 
> sharing the same hierarchical name, and this is a definite possibility 
> given my statements above. Of course, having the ability to import 
> modules from specific packages [1] would fix this, but only as long as 
> the package names are also unique.
> 
> Personally, I like the Java package naming scheme recommendation. It 
> scales better, because each package name uses the organization or URI to 
> uniquely identify a subset.

Personally, I have major qualms with the Java package naming scheme. In 
particular, using domain names sets the barrier to entry much too high 
for casual developers (e.g. most of the Haskell user base). Yes, DNs are 
cheap and plentiful, but this basically requires a lifetime lease of the 
DN in question and the migration path is covered in brambles. The 
alternative is simply to lie and make up a DN, in which case this 
degenerates into the exact same resource quandary as doing nothing (but 
with high overhead in guilt or registration paperwork).

The way CPAN is set up is much more egalitarian, though mired in a bit 
much administrivia for casual developers.

The orthogonality of package names to module names is something I 
consider very much a feature, and not a bug. The only other packaging 
system I've seen to offer this is Monticello for Squeak/SmallTalk, and 
I've missed it ever since. By making packages orthogonal that allows for 
developers to create drop-in replacement packages that offer the same 
module services as another package, without needing to alter any code 
that uses the old package (save relinking/recompiling). This is the same 
advantage as allowing different modules to offer the same functions 
(e.g. having Data.ByteString as a drop-in for the [ ]-portions of the 
Prelude), but lifted up to the next tier.

The question then is two-fold. First, is the question of how to minimize 
the problems of ambiguity and how to resolve conflicts when they arise. 
Second, is the question of whether this is really the job of Haskell, 
the language itself, or whether it is more appropriately dealt with by 
the build tools, e.g. Cabal. I'll deal a bit more with the latter question.

(( For readers who don't want to slog through the rest of this post, the 
conclusion is that I feel an agile packaging system is an imperative, as 
discussed above. The trick is finding a way to be agile without creating 
a maintenance and conflict nightmare. But given the imperative: baby, 
bathwater, etc. ))

I do like your (Sean Leather's) patch for being able to specify package 
names in source code, though I'd think something like Core's 
"package:module.module.module" syntax would be better if it gets adopted 
into Haskell'. I do however think that specifying the package should be 
optional, with conflicts to be resolved by commandline flags or via 
Cabal. Without this we loose the ability to have drop-in replacement 
packages, which in turn greatly complicates migration paths. The 
community is still young, but forks do happen and we would do best to 
allow for forwards compatibility whenever possible.

This approach also gives the same sort of split control as the various 
{-# FOO #-} pragma give. As an ad-hoc GHC solution, adding a new PACKAGE 
pragma would be better than just using a string there. In theory we can 
already do this with OPTIONS_GHC, though that pragma seems not to 
respect the -package option. Of course, the new pragma should be 
position restricted to make it obvious which imports it applies to, 
rather than assuming to apply to the whole file (i.e. by putting it 
where you put the strings).

One issue with this and Java's scheme of just concatenating package 
names onto module names is that they offer no provisions for specifying 
version restrictions. For a PACKAGE pragma we could design it deal with 
this too, since the modules themselves don't have versions. Of course 
this starts getting into hairy issues which Cabal was designed to 
resolve, so porting it back to the compiler seems misguided.

Perhaps a simpler option, for a Haskell' world, would be to give modules 
versions and give the import syntax some way of specifying the version 
to use. Sticking with something like the current packaging system, 
packages would just specify the module versions they provide, and those 
versions need not be related to the version of the package itself. This 
has the benefit of being able to release and maintain legacy packages, 
once the world has forked or moved on to a new major version.

As an addendum to this, it could be helpful if "package" names (i.e. 
alphanumeric sequences) were a part of the module version specification. 
This way a package hfoo-legacy could continue to provide the hfoo-1.24 
versions of modules, and it would be the package that forked off rather 
than forcing the new hfoo package to rename itself to break ties from 
the legacy code.

Another ability that the package/module system lacks right now is a good 
way for annotating deprecations. Java has this, but again they do it 
wrong. Whenever something is specified as deprecated it needs to provide 
a migration path to non-deprecated code. Simply saying "you fail" is an 
insufficient error message.

This proposal doesn't solve the resource allocation issue. That issue 
will always be around so long as we assume nodes in the dependency graph 
have unique names. And that assumption is a very useful expedient so 
we're unlikely to abandon it any time soon (though maybe we should). But 
I think giving modules explicit name-version annotations is a better 
path forward than adding more bureaucracy to the module hierarchy. I 
think the suggested best practices for naming modules should be refined 
since they're starting to get out of date with all the code on Hackage. 
In particular there's a lot of conflict about (1) where to put new 
interesting Num data types (Data.Number.*, Data.*, Numeric.*, ...?); (2) 
where to put testing and diagnostic tools (Debug.*, Test.*); and (3) 
where to put modules for the core operation of application projects. But 
beyond providing better guidance, I don't think we should have a central 
body issuing leases for the module namespace. Especially because we 
already have a packaging system which is orthogonal to the module system.

One of the reasons I love Haskell so much is because it is so extremely 
agile. I've been an active open-source developer for many years, and of 
all the languages I've used Haskell has by far the easiest system for 
communal public release of code. Perl's community is also very nice 
though it's gotten to be large enough that they do really need the 
bureaucracy they have. All the same it means less of my Perl code has 
made it into the wild than I would have liked. As for C and Java, the 
only stuff of mine that's managed to eek out into the public are whole 
projects, never any of the many small building blocks it takes to make 
something run and to make people able to bang out a program in a few 
hours because all the dirty work is already done and available in a 
large public repository.

-- 
Live well,
~wren