[Haskell-cafe] Correspondence between libraries and modules

Mon Apr 23 06:03:51 CEST 2012

On 4/22/12 6:30 PM, Alvaro Gutierrez wrote:
> On Sun, Apr 22, 2012 at 4:45 PM, Brandon Allbery<allbery.b at gmail.com>wrote:
>> One reason:  modules serve multiple purposes; one of these is namespacing,
>> and in the case of interfaces to foreign libraries that may force a
>> division that would otherwise not exist.
>
> Interesting. Could you elaborate on what the other purposes are, and
> perhaps point to an instance of the foreign library case?

The main purpose of namespacing (IMO) is to separate concerns and make 
it easier to figure out how a project fits together. The primary goal of 
modules is to resolve namespacing issues.

Consider one of my own libraries (chosen randomly via Safari's url 
autocompletion):

     http://hackage.haskell.org/package/bytestring-lexing

When I inherited this package there were the Data.ByteString.Lex.Double 
and Data.ByteString.Lex.Lazy.Double modules, which were separated 
because they provide the same API but for strict vs lazy ByteStrings. 
Both of those modules are concerned with lexing floating point numbers. 
I inherited the package because I wanted to publicize some code I had 
for lexing integers in various formats. Since that's quite a different 
task than lexing floating point numbers, I put it in its own module: 
Data.ByteString.Lex.Integral.

When dealing with FFI code, because of the impedance mismatch between 
Haskell and imperative languages like C, it's clear that there's going 
to be some massaging of the API beyond simply declaring FFI calls. As 
such, clearly we'd like to have separate modules for doing the low-level 
binding vs presenting a high-level API. Moreover, depending on what 
you're interfacing with, you may be forced to have multiple low-level 
modules. For example, if you use Google protocol buffers via the hprotoc 
package, then it will generate a separate module for each buffer type. 
That's fine, but usually it's not something you want to foist on your users.

On the other hand, the main purpose of packages or libraries is as unit 
of distribution, code reuse, and separate compilation. Even with the 
Haskell culture of making small libraries, most worthwhile units of 
distribution/reuse/compilation tend to be larger than a single 
namespace/concern. Thus, it makes sense to have more than one module per 
package, because otherwise we'd need some higher level mechanism in 
order to manage the collections of package-modules which should be 
considered a single unit (i.e., clients will almost always want the 
whole bunch of them).

However, centralization is prone to bottlenecks and systemic failure. As 
such, while it would be nice to ensure that a given module is provided 
by only one package, there is no mechanism in place to enforce this 
(except at compile time for the code that links the conflicting modules 
together). With few exceptions, it's considered bad form to knowingly 
use the same module name as is being used by another package. In part, 
it's bad form because egos are involved; but it's also bad form because 
there's poor technical support for resolving namespace collisions for 
module names. In GHC you can use -XPackageImports, which is workable but 
conflates issues of code with issues of provenance, which the Haskell 
Report intentionally keeps separate. However, until better technical 
support is implemented (not just for GHC, but also jhc, UHC,...) it's 
best to follow social practice.

> I'm confused as to how type families vs. fundeps play a role here -- as far
> as I can tell both are compiler extensions that do not provide modules.

Both TFs (or rather associated types) and fundeps aim to solve the same 
problem. Namely: when using multi-parameter type classes, it is often 
desirable to declare that one parameter is wholly defined by other 
parameters, either for semantic reasons or (more often) to help type 
inference. Since they both aim to solve the same problem, this raises a 
new problem: for some given type class, do I implement it with TF/ATs or 
with fundeps?

Some people figured to solve the new issue by implementing it both ways 
in separate packages, but reusing the same module names. (Witness for 
example mtl-2 aka monads-fd, vs monads-tf.) In practice, that didn't 
work out so well. Part of the reason for failure is that although 
fundeps and TF/ATs are formally equivalent in theory, in practice the 
implementation of TF/ATs has(had?) been missing some necessary 
machinery, and consequentially the TF/AT versions were not as powerful 
as the original fundep versions. Though the butterfly dependency issues 
certainly didn't help.

> I'm interested to see examples where two or more well-known yet unrelated
> modules clash under the same name; I can't imagine them coexisting in
> public very long -- wouldn't the confusion among users (e.g. when looking
> for documentation) be enough to either reconcile the modules or change one
> of the names?

That's not much of a problem in practice. There are lots of different 
books with a Chapter 1, but rarely is there any confusion about which 
one is meant. The same is true of module names in packages.

-- 
Live well,
~wren