[Haskell-cafe] Correspondence between libraries and modules
wren ng thornton
wren at freegeek.org
Wed Apr 25 05:44:28 CEST 2012
On 4/23/12 11:39 AM, Gregg Lebovitz wrote:
> On 04/23/2012 12:03 AM, wren ng thornton wrote:
>> However, until better technical support is implemented (not just for
>> GHC, but also jhc, UHC,...) it's best to follow social practice.
>
> Wren, I am new to Haskell and not aware of all of the conventions. Is
> there a place where I can find information on these social practices?
> Are they documented some place?
Not that I know of, though they're fairly standard for any open-source
programming community. E.g., when it comes to module names: familiarize
yourself with what's out there; try to fit in with the patterns you
see[1]; don't intentionally clash, steal namespaces[2], or squat on
valuable territory[3]; be reasonable and conscientious when interacting
with people.
[1] e.g., the use of Data.* for data structures which are
predominantly/universally treated as such, vs the use of Control.* for
things which are often thought of as control structures (monads, etc).
The use of Foo.Bar.Strict and Foo.Bar.Lazy when you provide both strict
and lazy versions of some whole API, usually with Foo.Bar re-exporting
whichever one seems the sensible default. The use of Foo.Bar.Class to
resolve circular import issues when defining a class and a bunch of
datatypes with instances. Etc.
[2] I mean things like if some package is providing a bunch of Foo.Bar.*
modules, and it's the only one doing so, then you should try to get in
touch with the maintainer before you start publishing your own Foo.Bar.*
modules--- in order to collaborate, to send patches up-stream, or just
to let them know what's going on.
[3] Witness an unintentional breach of this myself a while back. When I
was hacking up the exact-combinatorics package for my own use, I put
things in Math.Combinatorics.* since that's a reasonable place and
wasn't in use; but I didn't think of that fact when I decided to publish
the code. When pointed out, I promptly moved everything to
Math.Combinatorics.Exact.* since that project is only interested in
exact combinatorics and I have no intention of codifying all of
combinatoric theory; hence using Math.Combinatorics.* would be squatting
on very valuable names.
>> However, centralization is prone to bottlenecks and systemic failure.
>> As such, while it would be nice to ensure that a given module is
>> provided by only one package, there is no mechanism in place to
>> enforce this (except at compile time for the code that links the
>> conflicting modules together).
>
> From someone new to the community, it seems that yes centralization has
> its issues, but it also seems that practices could be put in place that
> minimize the bottlenecks and systemic failures.
>
> Unless I greatly misunderstand the challenges, there seem to be lot of
> ways to approach this problem and none of them are new. We all use
> systems that are composed of many modules neatly combined into complete
> systems. Linux distributions do this well. So does Java. Maybe should
> borough from their experiences and think about how we put packages
> together and what mechanisms we need to resolve inter-package dependencies.
Java attempts to resolve the issue by imposing universal authority (use
reverse urls for the first part of your package name). Many Java
developers flagrantly ignore that claim to authority. Sun/Oracle has no
interest in actually policing these violations, and there's no central
repository for leveraging social pressure to do it. Moreover,
open-source developers who do not have a commercial/institutional
affiliation are specifically placed in a tough spot, and are elided from
public discourse because of that fact, which is extremely problematic on
too many levels to get into here. Furthermore, many developers
---especially among open-source and academic authors--- have an inherent
distrust for ambient authority like this.
To pick another similar namespacing issue, consider the problem of
Google Code. In Google Code there's a single namespace for projects, and
the Google team spends a lot of effort on maintaining that namespace and
resolving conflicts. (I know folks who've worked in the lab next door to
that team. So, yes, they do spend a lot of work on it.) Whereas if you
consider BitBucket or GitHub, each user is given a separate project
namespace, and therefore the only thing that has to be maintained is the
user namespace--- which has to be done anyways in order to deal with
logins. The model of Google Code, SourceForge, and Java all assume that
projects and repositories are scarce resources. Back in the day that may
have been true (or may not), but today it is clearly false. Repos are
cheap and everyone has a dozen side projects.
If you look at the case of Perl and CPAN, there's the same old story:
universal authority. Contrary to Java, CPAN does very much actively
police (or rather, vett) the namespace. However, this extreme level of
policing requires a great deal of work and serves to drive away a great
many developers from publishing their code on CPAN.
I'm not as familiar with the innards of how various Linux distros manage
things, but they're also tasked with the additional burden of needing to
pull in stuff from places like CPAN, Hackage, etc. Because of that,
their namespace situation seems quite different from that of Hackage or
CPAN on their own. I do know that Debian at least (and presumably the
others as well) devote a great deal of manpower to all this.
So we have (1) the Java model where there are rules that noone follows;
(2) the Google Code, CPAN, and Linux distro model of devoting a great
deal of community resources to maintaining the rules; and (3) the
BitBucket, GitHub, Hackage model of having few institutionalized rules
and leaving it to social factors. The first option buys us nothing over
the last, excepting a false sense of security and the ability to
alienate private open-source developers.
The second option does arguably give us something, but it's extremely
expensive. I don't know if you've been involved in the administrative
side of that, but if not then it is far more expensive than you realize.
I've worked with CPAN, and many of the folks on this list do packaging
for Debian, Arch, and other Linux distros, so we're familiar with what
it means to ask for a universal authority. The Perl and Linux distro
communities are *huge* and so they can actually afford the cost of
setting up this authority, but even they run into limitations of scale.
Considering how much difficulty we've had getting someone to officially
take over Hackage so that we can finally get to using hackage2, it's
fair to say that Haskell has nowhere near a large enough community to
sustain the kind of work it would take to police the namespace.
There is no technical solution to this problem, at least not any used by
the communities you cite. The only solutions on offer require a great
deal of human effort, which is always a social/political/economic
matter. The only technical avenues I see are ways of making the problem
less problematic, such as GitHub and BitBucket distinguishing the user
namespace from each user's project namespace, such as the
-XPackageImports extension (which is essentially the same as
GitHub/BitBucket), or such as various ideas about using tree-grafting to
rearrange the module namespace on a per-project basis thereby allowing
clients to resolve the conflicts rather than requiring a global
solution. I'm quite interested in that last one, though I don't have any
time for it in the foreseeable future.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list