[Haskell-cafe] Correspondence between libraries and modules

Mon Apr 23 21:06:24 CEST 2012

Thanks for the write-up -- it's been very helpful!

On Mon, Apr 23, 2012 at 12:03 AM, wren ng thornton <wren at freegeek.org>wrote:

> Consider one of my own libraries (chosen randomly via Safari's url
> autocompletion):
>

>    http://hackage.haskell.org/**package/bytestring-lexing<http://hackage.haskell.org/package/bytestring-lexing>
>
> When I inherited this package there were the Data.ByteString.Lex.Double
> and Data.ByteString.Lex.Lazy.**Double modules, which were separated
> because they provide the same API but for strict vs lazy ByteStrings. Both
> of those modules are concerned with lexing floating point numbers. I
> inherited the package because I wanted to publicize some code I had for
> lexing integers in various formats. Since that's quite a different task
> than lexing floating point numbers, I put it in its own module:
> Data.ByteString.Lex.Integral.
>

I see. The first thing that comes to mind is the notion of module
granularity, which of course is subjective, so whether a single module or
multiple ones should handle e.g. doubles and integrals is a good question;
are there guidelines as to how those choices are made?

At any rate, why do these modules, with sufficiently-different
functionality, live in the same library -- is it that they share some
common bits of implementation, or to ease the management of source code?

When dealing with FFI code, because of the impedance mismatch between
> Haskell and imperative languages like C, it's clear that there's going to
> be some massaging of the API beyond simply declaring FFI calls. As such,
> clearly we'd like to have separate modules for doing the low-level binding
> vs presenting a high-level API. Moreover, depending on what you're
> interfacing with, you may be forced to have multiple low-level modules.

Ah, that's a good use case. Is the lower-level module usually made "public"
as well, or is it only an implementation detail?

> On the other hand, the main purpose of packages or libraries is as unit of
> distribution, code reuse, and separate compilation. Even with the Haskell
> culture of making small libraries, most worthwhile units of
> distribution/reuse/compilation tend to be larger than a single
> namespace/concern. Thus, it makes sense to have more than one module per
> package, because otherwise we'd need some higher level mechanism in order
> to manage the collections of package-modules which should be considered a
> single unit (i.e., clients will almost always want the whole bunch of them).
>

This is the part that I'm trying to get a better sense of. I can see how in
some cases, it makes sense for more than one module to form a unit, because
they are tightly coupled semantically or implementation-wise -- so clients
will indeed want the whole bunch. On the other hand, several libraries
provide modules that are all over the place, in a way that doesn't form a
"unit" of any kind (e.g. MissingH), and it's not clear that you would want
any Network stuff when all you need is String utilities.

However, centralization is prone to bottlenecks and systemic failure. As
> such, while it would be nice to ensure that a given module is provided by
> only one package, there is no mechanism in place to enforce this (except at
> compile time for the code that links the conflicting modules together).
> With few exceptions, it's considered bad form to knowingly use the same
> module name as is being used by another package. In part, it's bad form
> because egos are involved; but it's also bad form because there's poor
> technical support for resolving namespace collisions for module names. In
> GHC you can use -XPackageImports, which is workable but conflates issues of
> code with issues of provenance, which the Haskell Report intentionally
> keeps separate. However, until better technical support is implemented (not
> just for GHC, but also jhc, UHC,...) it's best to follow social practice.
>
>
But the way you describe it, it seems that despite centralization having
those disadvantages, it is more or less the way the system works, socially
(egos, bad form, etc.) and technically (because of the lack of compiler
support) -- except that it is ad-hoc instead of mechanically enforced. In
other words, I don't see what the advantages of allowing ambiguity
currently are.

Some people figured to solve the new issue by implementing it both ways in
> separate packages, but reusing the same module names. (Witness for example
> mtl-2 aka monads-fd, vs monads-tf.) In practice, that didn't work out so
> well. Part of the reason for failure is that although fundeps and TF/ATs
> are formally equivalent in theory, in practice the implementation of TF/ATs
> has(had?) been missing some necessary machinery, and consequentially the
> TF/AT versions were not as powerful as the original fundep versions. Though
> the butterfly dependency issues certainly didn't help.

Ah, interesting. So, perhaps I misunderstand, but this seems like an
argument in favor of having uniquely-named modules (e.g. Foo.FD and
Foo.TF) instead of overlapping ones, right?

Alvaro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120423/02b15b13/attachment.htm>