[Hs-Generics] A Lightweight Implementation of Generics and Dynamics (LIGD)

Fri Oct 20 04:40:31 EDT 2006

Hi there.

A few remarks.

> >Usage:
> >   - Library Writer: Responsible for the sums-of-products machinery,
> >     the representation(s) data type(s) to be defined in the  
> >library and
> >     defining the class and standard instances for "Representable".
> >     The library writer needs to be have a deep knowledge about  
> >generic
> >     programming.
> >   - Power User: Defines the boilerplate for new data types. Needs  
> >to be fairly
> >     familiar with the implementation.
> >   - User: defines generic functions by pattern matching on the  
> >structure of "Rep".
> >     This requires that he is comfortable with sums of products and  
> >isomorphisms.
> >     For convenience, the user should also provide a definition  
> >that uses the
> >     type class "Representable", which avoids the representations  
> >to be passed
> >     explicitly.
> >   - End User: If "Representable" is used, then the end can just  
> >use generic functions
> >     in the same way as any other ad-hoc (that is, type class  
> >overloaded) functions.
> >
> 
> I'm trying to get these roles straight in my head. I think we're  
> going to have a similar division for many of the library proposals.
> 
> A Generic library has several important components: the "framework"  
> or way of representing a particular type, instances of this framework  
> for the primitive types, helper functions for creating instances for  
> new types, and a base set of generic operations/support code to get  
> things started. The "Library writer" is responsible for all of these.
> 
> A user of the generic library may want to do two things:
>   (a) define new type-indexed functions
>   (b) ensure that their datatypes can be used with type-indexed  
> functions
> In Bruno's terminology,
>   - Power User: does both of these
>   - User: does (b) but not (a)
>   - End User: does neither, just calls existing type-indexed  
> functions on existing datatypes.

I think that this is a good summary.

> Note that we could imagine eliminating/automating (b) for some  
> platforms with Template Haskell.

Is this a statement specific to LIGD? Using Template Haskell would in
my opinion be a good argument for Template Haskell itself, but not
very practical, because I have the feeling that it is a relatively
unstable (in the sense of interface changes) extension of the Haskell
language. Also, it is definitely not going to be in Haskell', and it's
use wouldn't really be optional, because TH-generated code can (as far
as I know) so far not be exported and saved as a normal Haskell
module.

Using a tool like DrIFT would have the advantage that people not having
that tool available could still make full use of generic programs.
They'd just have more work.

> >Extensibility:
> >   - Data types are non-extensible, however we can try to combine the
> >     approach with type classes to get some form of extensibility  
> >(RepLib
> >     proposes one solution like this).
> >
> 
> I think I prefer the term "Specializability" (even though it is more  
> of a mouthful.) Generic
> functions can be specialized at particular datatypes to specific  
> operations.
> (LIGD does not support this behavior, but could be extended to do so  
> by pushing RepLib through the GADT translation.)

The phrase "specialization" makes sense if you consider generic functions
that in principle work on all data types, and now are trying to assign a
new, specific behaviour to the function for one data type. But what if we
have defined a function in the style of a generic programmign approach
such as LIGD that does not have a catch-all case and does only work for
a limited number of data types. That'd correspond to an ad-hoc overloaded
functions. You might still want to extend such functions to new data types.
And in this case, I think it's better to call the function "extensible" or
"open".

> >Performance considerations:
> >   - Sums of products based and therefore involving considerable
> >     performance penalties.
> >
> 
> Besides the runtime types and sums-of-products explosion, there is  
> also the issue of applying the identity function type coercions (if  
> they are not optimized away). Those at least would be eliminated by  
> using a GADT.

Isn't "explosion" a bit exaggerated, or am I missing something?
I think the main overhead of the sums-of-products representation lies
in the fact that it is different from the representation Haskell usually
has for a value. So, at least in principle, every value has to actually
be converted twice in order to apply a generic function, and optimizing
this is tricky, especially if we want to do it in Haskell, without applying
special knowledge about the application domain.

> Aside: should we be developing a benchmark suite if we are actually  
> going to be making decisions about performance? I don't want to guess  
> here. Given that we will be collecting versions of several functions  
> for each library, we should be able to run sort of "Library shootout".

Yes, I think that would be a very good idea.

> I also would like to know if there are some specific Haskell  
> extensions that could dramatically improve performance. These  
> extensions would not be portable, of course, but perhaps for specific  
> platforms we could support optimized execution.

Maybe RULES pragmas could help, but it would almost certainly also
imply writing the conversion functions in a specific style.

> >Expressibility:
> >   - Can do both producer and consumer functions.
> >   - Multiple representations needed for generic functions of  
> >different
> >     arities;
> >   - No local redefinitions.
> 
> I think we should separate expressibility of the generic functions  
> and expressibility wrt to what type information can index such  
> functions.
> 
> For the former we talk about producer/consumer functions (read/show),  
> functions of different arities (map,zip), functions defined by higher- 
> order types (map, count, reduce), functions requiring type-indexed  
> types (the zipper, bit-packed arrays, generalized tries).
> 
> For the latter, we talk about the ability to use the library with  
> types that contain
>   records
>   nested datatypes
>   higher-kinded types (what kinds?)
>   datatypes with existential components (both of kind type and  
> higher kinds)
>   datatypes with first-class polymorphic components (both of kind  
> type and higher kinds)
>   above with class constraints
>   GADTs (simple, and those that subsume existentials...)
>   datatypes with higher-kinded type arguments

Yes, that is a good idea as well. I will have another look at LIGD
to find out how that fits in these categories.

> What do you mean by "local redefinitions"?

Stefan has already given an answer. I think Bruno mentions this here
because "local redefinitions" are a bit like having functions of different
arities.

> 
> >
> >Boilerplate:
> >   - Isomorphisms between sums of products and data types
> >   - Instances of Representable
> >   - Smart constructors for representation of data types
> >
> >Helpful extra features:
> >   - GADTs (already implemented in GHC) allow more direct definitions
> >     of generic functions, since there is no need to apply the type
> >     coercions explicitly;
> 
> >   - A boilerplate generation mechanism. This would effectively mean
> >     that power users would not be necessary.
> 
> >   - Open data types and functions in order for extensibility.
> 
> 
> >   - Kind polymorphism, for eliminating the need for multiple
> >     representations;
> >
> 
> I don't think that kind polymorphism would solve this problem---the  
> multiple representations are needed for functions (like map) that  
> have different arities of kind-indexed types.
> 
> Kind polymorphism would help a little, but it wouldn't be completely  
> useful without some form of kind-indexed types. (The representations  
> of types with higher kinds is not parametric in those kinds, nor is  
> the behavior of type-indexed operations on those representations.)

I think that's true. Nevertheless, kind polymorphism would be a
necessary first step to do these things.

> >Discussion:
> >    An interesting approach which is quite lightweight and that is  
> >fairly easy to
> >use (specially if we would not need to  handle any of the  
> >boilerplate). It is a bit
> >outdated because with GADTs, the use can be slightly simplified.  
> >However, this also
> >takes us further away from Haskell 98 or even Haskell' (since GADTs  
> >will not be there).
> 
> I think the case for GADTs in Haskell' would be strengthened if there  
> were other Haskell compilers than GHC that implement GADTs. Does  
> anyone know of any?

I haven't heard of any plans. The next HCAR is going to happen soon,
so maybe I'll get a report from one of the implementation teams that
tells me they're working on it.

Cheers,
  Andres