[Haskell-cafe] Correspondence between libraries and modules
wren ng thornton
wren at freegeek.org
Wed Apr 25 06:57:45 CEST 2012
On 4/23/12 3:06 PM, Alvaro Gutierrez wrote:
> I see. The first thing that comes to mind is the notion of module
> granularity, which of course is subjective, so whether a single module or
> multiple ones should handle e.g. doubles and integrals is a good question;
> are there guidelines as to how those choices are made?
I'm not sure if there are any guidelines per se; that's more of a
general software engineering problem. If you browse around on Hackage
you'll get a fairly good idea what the norms are though. Everyone seems
to have settled on a common range of scope--- with notable exceptions
like the containers library with far too many functions per module, and
some of Ed Kmett's work on category theory which tends towards very few
declarations per module.
> At any rate, why do these modules, with sufficiently-different
> functionality, live in the same library -- is it that they share some
> common bits of implementation, or to ease the management of source code?
I contacted Don Stewart (the former maintainer) to see whether he
thought I should release the integral stuff on its own, or integrate it
into bytestring-lexing. We agreed that it made more sense to try to
build up a core library for lexing various common data types, rather
than having a bunch of little libraries. He'd just never had time to get
around to developing bytestring-lexing further; so I took over.
Eventually I plan to add rendering functions for floating point, and to
split up the parsers for different floating point formats[1], so that it
more closely resembles the integral stuff. But that won't be until this
fall or later, unless someone requests it sooner.
[1] Having an omni-parser can be helpful when you want to be liberal
about your input. But when you're writing parsers for a specified
format, usually they're not that liberal so we need to offer restricted
lexers in order to give code reuse.
>> When dealing with FFI code, because of the impedance mismatch between
>> Haskell and imperative languages like C, it's clear that there's going to
>> be some massaging of the API beyond simply declaring FFI calls. As such,
>> clearly we'd like to have separate modules for doing the low-level binding
>> vs presenting a high-level API. Moreover, depending on what you're
>> interfacing with, you may be forced to have multiple low-level modules.
>
> Ah, that's a good use case. Is the lower-level module usually made "public"
> as well, or is it only an implementation detail?
Depends on the project. For ByteStrings, most of that is hidden away as
implementation details. For binding to C libraries, I think the current
advice is to offer the low-level interface so that if there's something
the high-level interface can't handle well, people have some easy recourse.
>> On the other hand, the main purpose of packages or libraries is as unit of
>> distribution, code reuse, and separate compilation. Even with the Haskell
>> culture of making small libraries, most worthwhile units of
>> distribution/reuse/compilation tend to be larger than a single
>> namespace/concern. Thus, it makes sense to have more than one module per
>> package, because otherwise we'd need some higher level mechanism in order
>> to manage the collections of package-modules which should be considered a
>> single unit (i.e., clients will almost always want the whole bunch of them).
>
> This is the part that I'm trying to get a better sense of. I can see how in
> some cases, it makes sense for more than one module to form a unit, because
> they are tightly coupled semantically or implementation-wise -- so clients
> will indeed want the whole bunch. On the other hand, several libraries
> provide modules that are all over the place, in a way that doesn't form a
> "unit" of any kind (e.g. MissingH), and it's not clear that you would want
> any Network stuff when all you need is String utilities.
Yeah, MissingH and similar libraries are just grab-bags full of stuff.
Usually grab-bag libraries think of themselves as place-holders, with
the intention of breaking things out once there's something of a large
enough size to warrant being its own package. (Whether the breaking out
actually happens is another matter.) But to get the general sense of
things, you should ignore them.
Instead, consider one of the parsing libraries like uu-parsinglib,
attoparsec, parsec, frisby. There are lots of pieces to a parsing
framework, but it makes sense to distribute them together.
Or, consider one of the base libraries for iteratees, enumerators,
pipes, conduits, etc. Like parsing, these offer a whole framework. You
won't usually need 100% of it, but everyone needs a different 80%.
Or to mention some more of my own packages, consider stm-chans,
unification-fd, or unix-bytestrings. In unification-fd, the stuff
outside of Control.Unification.* could be moved elsewhere, but the stuff
within there makes sense to be split up yet distributed together. For
stm-chans because of the similarity in interfaces, use cases, etc, it'd
be peculiar to want to separate them into different packages. In
unix-bytestring I separated off the Iovec stuff (FFI implementation
details) from the main API, but clearly they must go together.
> But the way you describe it, it seems that despite centralization having
> those disadvantages, it is more or less the way the system works, socially
> (egos, bad form, etc.) and technically (because of the lack of compiler
> support)
There's a difference between centralization and communalization.
With centralization there's a central authority who makes all the rules
and (usaully) enforces them. This is the benevolent dictator model
common in open-source. The problem is: what do you do if the dictator
goes missing (gets hit by a bus, is too busy this semester, etc)?
With communalization, there's no central authority that writes/enforces
the laws; instead, the community as a whole will come to agree on the
norms. This is the way societies often operate (i.e., societies as
cultures, rather than as governments). In virtue of the social
interaction, things come to be a particular way, but there isn't
necessarily any person or committee that decided it should be that way.
Moreover, in order to disrupt the norms it's not enough to dispose of a
dictator; you need some wide-scale way of disrupting the network of
social interaction. The problem here is that it can be very hard to
steer a community. If you've identified a problem, it's not clear how to
get it fixed (whereas a dictator could just issue a fiat).
In practice, every organization has a bit of both models; it's just a
question of how much of each, and in what contexts. The Haskell
community is more centralized when it comes to things like the Haskell
Report and the Haskell Platform, because you really need it there.
Whereas Hackage and the Cafe are more of your standard social community.
> except that it is ad-hoc instead of mechanically enforced. In
> other words, I don't see what the advantages of allowing ambiguity
> currently are.
If you mechanically enforce things then you will find clashes. That's
not the problem: clashes exist, you find them, whatever. The problem is:
now that you've found it, how are you going to resolve it?
You can't just make Hackage refuse packages which would cause a module
name conflict. If you try then you'll get angry developers who just
leave or who badmouth Haskell (or both), which does no good for anyone.
You have to have an escape hatch, some way for people to raise
legitimate issues such as "the conflictor hasn't been maintained in five
years and has no users", or "I wrote the old package and this new
package is meant to supersede it", etc. But now you need to have a group
of people who work on resolving those issues and making those
case-by-case decisions about how conflicts should be resolved.
Allowing clashes saves you from needing that group of people. If you
allow clashes, there are no developer complaints to be resolved. A lot
of resources are tied up in making those central authority groups, and
by not having such a central authority we free up those resources to be
used elsewhere.
In cases like Perl's CPAN and Linux distros, they have enough resources
that they can afford the overhead cost to create and maintain such
groups. In addition, they're large enough that the resources for that
group doesn't necessarily diminish the resources for other things. E.g.,
some members of the Linux developer community are no good at
programming, but they're great at social organization. If you have a
central authority group, they can contribute to that and thereby provide
resources; vs, if there's no such group, they're unlikely to offer
programming time or other resources instead.
Whereas for small communities: overhead costs are higher proportionally,
and small communities aren't able to gather as many resources to cover
them. In addition, the person who could offer social organization is
probably already offering other resources which she wouldn't be able to
offer if she moved over to helping the central authority; so you're
closer to a zero-sum game of needing to decide how to allocate your
scarce resources.
> Ah, interesting. So, perhaps I misunderstand, but this seems like an
> argument in favor of having uniquely-named modules (e.g. Foo.FD and
> Foo.TF) instead of overlapping ones, right?
Yeah, probably.
I mean, ideally I'd like to see GHC retooled so that both fundeps and
type families actually compile down to the same code, and one is just
sugar for the other (or both are sugar for some third thing). Then we'd
get rid of the real problem of there being multiple incompatible ways of
doing the same thing. Until then, it's probably better to just pick one
approach for each project, rather than trying to maintain parallel forks
for each approach. But if you're going to maintain parallel forks, then
it's probably best to not do the module punning thing.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list