[Haskell-cafe] Re: Can we do better than duplicate APIs? [was: Data.CompactString 0.3]

Wed Mar 28 15:31:22 EDT 2007

On Mar 28, 2007, at 2:44 PM, Benjamin Franksen wrote:

> Robert Dockins wrote:
>>> After taking a look at the Haddock docs, I was impressed by the  
>>> amount of
>>> repetition in the APIs. Not ony does Data.CompactString duplicate  
>>> the
> whole
>>> Data.ByteString interface (~100 functions, adding some more for  
>>> encoding
>>> and decoding), the whole interface is again repeated another four  
>>> times,
>>> once for each supported encoding.
>>
>> I'd like to mention that as maintainer of Edison, I face similar
> difficulties.
>> The data structure interfaces have scores of functions and there  
>> are about
> 20
>> different concrete implementations of various sorts.  Even minor  
>> interface
>> changes require a lot of tedious editing to make sure that everything
> stays
>> in sync.
>
> But... you have the type of all functions nailed down in classes.  
> Thus, even
> if a change in the API means a lot of tedious work adapting the  
> concrete
> implementations, at least the compiler helps you to check that the
> implementations will conform to the interface (class);

This is true.

> and users have to
> consult only the API docs, and not every single function in all 20
> implementations. With ByteString and friends there is (yet) no common
> interface laid down anywhere. All the commonality is based on  
> custom and
> good sense and the willingness and ability of the developers to  
> make their
> interfaces compatible to those of others.
>
>>> One could use code
>>> generation or macro expansion to alleviate this, but IMO the  
>>> necessity to
>>> use extra-language pre-processors points to a weakness in the  
>>> language;
> it
>>> be much less complicated and more satisfying to use a language  
>>> feature
> that
>>> avoids the repetition instead of generating code to facilitate it.
>>
>> I've considered something like this for Edison.  Actually, I've  
>> considered
>> going even further and building the Edison concrete  
>> implementations in a
>> theorem prover to prove correctness and then extracting the Haskell
> source.
>> Some sort of in-langauge or extra-language support for mechanicly
> producing
>> the source files for the full API from the optimized "core" API  
>> would be
>> quite welcome.  Handling export lists,
>
> How so? I thought in Edision the API is a set of type classes.  
> Doesn't that
> mean export lists can be empty (since instances are exported
> automatically)?

No.  Edison allows you to directly import the module and bypass the  
typeclass APIs if you wish.  Also, some implementations have special  
functions that are not part of the general API, and are only  
available via the module exports.

One could make typeclasses the only way to access the main API, but I  
rather suspect there would be performance implications.  I get the  
impression that typeclass specialization is less advanced than  
intermodule inlining (could be wrong though).

>> haddock comments,
>
> I thought all the documentation would be in the API classes, not in  
> the
> concrete implementations.

It is now, but I've gotten complaints about that (which are at least  
semi-justified, I feel).  Also, the various implementations have  
different time bounds which must documented in the individual  
modules.  Ideally, I'd like to have the function documentation string  
and the time bounds on each function in each concrete  
implementation.  I've not done this because its just too painful to  
maintain manually.

>> typeclass instances,
>> etc, are quite tedious.
>>
>> I have to admit, I'm not sure what an in-language mechanism for doing
>> something like this would look like.  Template Haskell is an  
>> option, I
>> suppose, but its pretty hard to work with and highly non- 
>> portable.  It
> also
>> wouldn't produce Haddock-consumable source files.  ML-style first  
>> class
>> modules might fit the bill, but I'm not sure anyone is seriously
> interested
>> in bolting that onto Haskell.
>
> As I explained to SPJ, I am less concerned with duplicated work when
> implementing concrete data structures, as with the fact that there  
> is still
> no (compiler checkable) common interface for e.g. string-like  
> thingies,
> apart from convention to use similar names for similar features.

Fair enough.  I guess my point is that typeclasses (ad per Edison)  
are only a partial solution to this problem, even if you can stretch  
them sufficiently (with eg, MPTC+fundeps+whatever other extension) to  
make them cover all your concrete implementations.

> Cheers
> Ben

Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG