[Haskell-cafe] Correspondence between libraries and modules

Wed Apr 25 05:44:28 CEST 2012

On 4/23/12 11:39 AM, Gregg Lebovitz wrote:
> On 04/23/2012 12:03 AM, wren ng thornton wrote:
>> However, until better technical support is implemented (not just for
>> GHC, but also jhc, UHC,...) it's best to follow social practice.
>
> Wren, I am new to Haskell and not aware of all of the conventions. Is
> there a place where I can find information on these social practices?
> Are they documented some place?

Not that I know of, though they're fairly standard for any open-source 
programming community. E.g., when it comes to module names: familiarize 
yourself with what's out there; try to fit in with the patterns you 
see[1]; don't intentionally clash, steal namespaces[2], or squat on 
valuable territory[3]; be reasonable and conscientious when interacting 
with people.

[1] e.g., the use of Data.* for data structures which are 
predominantly/universally treated as such, vs the use of Control.* for 
things which are often thought of as control structures (monads, etc). 
The use of Foo.Bar.Strict and Foo.Bar.Lazy when you provide both strict 
and lazy versions of some whole API, usually with Foo.Bar re-exporting 
whichever one seems the sensible default. The use of Foo.Bar.Class to 
resolve circular import issues when defining a class and a bunch of 
datatypes with instances. Etc.

[2] I mean things like if some package is providing a bunch of Foo.Bar.* 
modules, and it's the only one doing so, then you should try to get in 
touch with the maintainer before you start publishing your own Foo.Bar.* 
modules--- in order to collaborate, to send patches up-stream, or just 
to let them know what's going on.

[3] Witness an unintentional breach of this myself a while back. When I 
was hacking up the exact-combinatorics package for my own use, I put 
things in Math.Combinatorics.* since that's a reasonable place and 
wasn't in use; but I didn't think of that fact when I decided to publish 
the code. When pointed out, I promptly moved everything to 
Math.Combinatorics.Exact.* since that project is only interested in 
exact combinatorics and I have no intention of codifying all of 
combinatoric theory; hence using Math.Combinatorics.* would be squatting 
on very valuable names.

>> However, centralization is prone to bottlenecks and systemic failure.
>> As such, while it would be nice to ensure that a given module is
>> provided by only one package, there is no mechanism in place to
>> enforce this (except at compile time for the code that links the
>> conflicting modules together).
>
>  From someone new to the community, it seems that yes centralization has
> its issues, but it also seems that practices could be put in place that
> minimize the bottlenecks and systemic failures.
>
> Unless I greatly misunderstand the challenges, there seem to be lot of
> ways to approach this problem and none of them are new. We all use
> systems that are composed of many modules neatly combined into complete
> systems. Linux distributions do this well. So does Java. Maybe should
> borough from their experiences and think about how we put packages
> together and what mechanisms we need to resolve inter-package dependencies.

Java attempts to resolve the issue by imposing universal authority (use 
reverse urls for the first part of your package name). Many Java 
developers flagrantly ignore that claim to authority. Sun/Oracle has no 
interest in actually policing these violations, and there's no central 
repository for leveraging social pressure to do it. Moreover, 
open-source developers who do not have a commercial/institutional 
affiliation are specifically placed in a tough spot, and are elided from 
public discourse because of that fact, which is extremely problematic on 
too many levels to get into here. Furthermore, many developers 
---especially among open-source and academic authors--- have an inherent 
distrust for ambient authority like this.

To pick another similar namespacing issue, consider the problem of 
Google Code. In Google Code there's a single namespace for projects, and 
the Google team spends a lot of effort on maintaining that namespace and 
resolving conflicts. (I know folks who've worked in the lab next door to 
that team. So, yes, they do spend a lot of work on it.) Whereas if you 
consider BitBucket or GitHub, each user is given a separate project 
namespace, and therefore the only thing that has to be maintained is the 
user namespace--- which has to be done anyways in order to deal with 
logins. The model of Google Code, SourceForge, and Java all assume that 
projects and repositories are scarce resources. Back in the day that may 
have been true (or may not), but today it is clearly false. Repos are 
cheap and everyone has a dozen side projects.

If you look at the case of Perl and CPAN, there's the same old story: 
universal authority. Contrary to Java, CPAN does very much actively 
police (or rather, vett) the namespace. However, this extreme level of 
policing requires a great deal of work and serves to drive away a great 
many developers from publishing their code on CPAN.

I'm not as familiar with the innards of how various Linux distros manage 
things, but they're also tasked with the additional burden of needing to 
pull in stuff from places like CPAN, Hackage, etc. Because of that, 
their namespace situation seems quite different from that of Hackage or 
CPAN on their own. I do know that Debian at least (and presumably the 
others as well) devote a great deal of manpower to all this.

So we have (1) the Java model where there are rules that noone follows; 
(2) the Google Code, CPAN, and Linux distro model of devoting a great 
deal of community resources to maintaining the rules; and (3) the 
BitBucket, GitHub, Hackage model of having few institutionalized rules 
and leaving it to social factors. The first option buys us nothing over 
the last, excepting a false sense of security and the ability to 
alienate private open-source developers.

The second option does arguably give us something, but it's extremely 
expensive. I don't know if you've been involved in the administrative 
side of that, but if not then it is far more expensive than you realize. 
I've worked with CPAN, and many of the folks on this list do packaging 
for Debian, Arch, and other Linux distros, so we're familiar with what 
it means to ask for a universal authority. The Perl and Linux distro 
communities are *huge* and so they can actually afford the cost of 
setting up this authority, but even they run into limitations of scale. 
Considering how much difficulty we've had getting someone to officially 
take over Hackage so that we can finally get to using hackage2, it's 
fair to say that Haskell has nowhere near a large enough community to 
sustain the kind of work it would take to police the namespace.

There is no technical solution to this problem, at least not any used by 
the communities you cite. The only solutions on offer require a great 
deal of human effort, which is always a social/political/economic 
matter. The only technical avenues I see are ways of making the problem 
less problematic, such as GitHub and BitBucket distinguishing the user 
namespace from each user's project namespace, such as the 
-XPackageImports extension (which is essentially the same as 
GitHub/BitBucket), or such as various ideas about using tree-grafting to 
rearrange the module namespace on a per-project basis thereby allowing 
clients to resolve the conflicts rather than requiring a global 
solution. I'm quite interested in that last one, though I don't have any 
time for it in the foreseeable future.

-- 
Live well,
~wren