[Haskell] class assosiated types, via GADTs or FDs

Thu Feb 17 01:03:43 EST 2005

Hi,

you mention three drawbacks FDs have against CATs
I believe that all three drawbacks can be addressed.

*** Short summary ***
(I'm using Manuel's numbering scheme, see his original email below)

Drawback #2: "FDs introduce clutter"

True, I think CATs are a great idea but it's also important
to note that FDs are strictly more powerful. I further claim that
CATs can be viewed as syntactic sugar for a special class of FDs.

Drawback #1: "CAT to FD encoding introduces (potentially)
            non-terminating instances"

I claim that there exists a translation from CAT to FD
such that FD instances generated are terminating.

Drawback #3: "Lack of abstraction"

There's well-typed translation scheme which retains abstraction.

Hence, I conclude that CATs can be translated to FDs
(retaining termination and abstraction) such that the
resulting FD program is accepted by GHC.

Here are more details.

********************************************
*** Drawback #2: "FDs introduce clutter" ***
********************************************

FDs are strictly more powerful.
We can express more complicated type relations such as

class Foo a b c | a c -> b, b c -> a

Anyway, I agree there are styles of programming where we'd like to
hide all "functionally defined" parameters. I claim that rather than 
developing CATs as a new type class extension we should view them
as syntactic sugar for (a special class of) FDs. 
So, let's see how to address drawbacks 1 and 3.

*****************************************************************
*** Drawback #1: "CAT to FD encoding introduces (potentially) ***
***            non-terminating instances"                     ***
*****************************************************************

Short summary: There exists a translation from CAT to FD
               such that FD instances generated are terminating.

In [2] you show how to encode the Array example.

Array CAT example:

class ArrayElem e where
  data Array e
  index :: Array e -> Int -> e

instance (ArrayElem a, ArrayElem b) =>
         ArrayElem (a,b) where
  data Array (a,b) = PairArray (Array a) (Array b)

I repeat your FD encoding:

class ArrayElem a b | a->b where
    index :: b -> Int -> a

instance (ArrayElem a a2, ArrayElem b b2) => ArrayElem (a,b) (a2,b2)

In [4] it is shown that such instances may lead to non-termination
(of type inference). Assume we encounter the constraint
ArrayElem (a,b) a, then the process of type improvement and context
reduction will not terminate.
E.g., 

    ArrayElem (a,b) a 
--> ArrayElem ((c,d),b) (c,d), a=(c,d)
--> ArrayElem (c,d) c, ArrayElem b d, a=(c,d)
--> ...

I claim that this is not the "proper" way to encode CATs via FDs.

Proper FD encoding:

class Array a b | a->b where
   index :: b->Int->a

-- Note that in the internal translation of CATs to System F,
-- a new data type is introduced for each instance.
-- Let's simply do that for the FD encoding as well.

data ArrayPair a b ar br = PairArray ar br
              ^^^^^

instance (Array a ar, Array b br) =>         (*)
         Array (a,b) (ArrayPair a b ar br)
                                ^^^^

The underlined (phantom) parameters seem somewhat redundant
but are the main trick to ensure termination.

MP Jones states in [3] the following "termination" condition:
(For simplicity, I only show the case for two-parameter
type classes)

Assume class F a b | a->b, then for each 
instance ... => F t1 t2 we have that fv(t2) subsetof fv(t1).

In [4] it is shown that this condition guarantees termination.
Note that instance (*) doesn't satisfy this condition.

For the proper CAT via FD encoding we need a new termination
condition.

Variation of termination condition:

Assume class F a b | a->b, then for each 
instance ... => F t1 t2 we have that fv(t1) subsetof fv(t2).
                                     ^^^^^^^^^^^^^^^^^^^^^^

Note that t1 and t2 have switched their roles.
Instances (*) satisfies this new condition.

The short story is that this is another condition which
guarantees termination. Let's reconsider the "critical"
example from above. We find that

    ArrayElem (a,b) a 
--> ArrayElem ((c,d),b) (c,d), a=ArrayPair a b (c,d)
--> fail!                      ^^^^^^^^^^^^^^^^^^^^^
                               occurs check fails

Obviously, I've only shown you one example.
My experience with CATs and FDs tells me this is
the way to go. It's fairly straightforward to
generalize the above encoding principle.
If somebody is interesting in fully formalizing
the translation from CAT to a decidable
fragment of FDs pl feel free to contact me.

****************************************
*** Drawback #3: Lack of abstraction ***
****************************************                         

I repeat the MapKey example.

 > Now suppose a user wants to define a function
 > 
 >   lookupInt :: MapKey Int fm => k -> fm v -> Maybe v
 >   lookupInt = lookup
 > 
 > then they will find that the compiler rejects this definition.  (The
 > reason lies in when the type checker uses Mark Jones' improvement rule
 > [3].)

Another interesting issue.
I suppose by compiler you refer to HUGS and GHC!? Indeed, the 
dictionary translation scheme employed in these systems fails
(I leave out the formal details, ask me if you're interested).
Actually, the above program is perfectly fine in terms of
the dictionary translation scheme defined in:

[6] Peter J. Stuckey and Martin Sulzmann. A Theory of Overloading.
    To appear in ACM Transactions on Programming Languages and Systems
    (TOPLAS).
    Preprint: http://www.comp.nus.edu.sg/~sulzmann/chr/download/theory-journal.ps.gz

Note that the scheme in [6] is based on an untyped target language.
I guess you imply that no typed translation of the above FD program
exists? 
Note that in [2] you also need to rule out certain class
declarations (see comment on pages10-11 in [2]) otherwise
your translation is not well-typed either.
Anyway, I claim there's a well-typed translation but
I will omit the details which I'll report in more details later.
The main point here is anyway that the "lack of abstraction" issue
will not occur if you use FDs as internal syntax for CATs.
This is sufficient
to achieve a well-typed translation of CATs to the FD fragment
available in GHC.

- Martin

Manuel M T Chakravarty writes:
 > On Tue, 2005-02-15 at 10:16 +0000, Keean Schupke wrote:
 > > Perhaps i'm being dumb, but I dont see the need for either GADTs or 
 > > class-associated-types to do this... I am pretty sure it can be done 
 > > using fundeps,
 > 
 > Yes, you can encode this example with functional dependencies (see
 > Section 3.4 of [1]), but functional dependencies have three drawbacks,
 > which are discussed in the associated types paper [2].  In case you
 > don't want to read the paper, the problems are the following.  Your
 > class declaration with functional dependencies becomes
 > 
 >   class MapKey k fm | k -> fm where
 >     lookup :: k -> fm v -> Maybe v
 >     empty :: fm v
 >       -- NB: The signature of empty doesn't mention k, which
 >       --     is already problematic, but you can fix this by
 >       --     adding it as a dummy argument to fm.
 > 
 > Drawback #1:
 > ------------
 > If you want to add an instance for keys of pair type, you need an
 > instance of the form
 > 
 >   data MapKeyPair fm1 fm2 v = ...
 >   class (MapKey k1 fm1, MapKey k2 fm2) => 
 >     MapKey (k1, k2) (MapKeyPair fm1 fm2) where
 >     ...
 > 
 > If you look into Section 6.1 of the functional dependencies paper [3],
 > you'll see that this definition is actually not admitted.  GHC allows it
 > nonetheless.  As a result, the type checker will on some programs, that
 > it ought to reject, simply not terminate - that's been pointed out by
 > [4].
 > 
 > Drawback #2:
 > ------------
 > There's been an interesting study about the support for a certain form
 > of generic programming in six different programming languages (four OO
 > languages and two FP languages) [5].  In that study the not quite
 > satisfactory support of associated types in Haskell via functional
 > dependencies is cited as the *only* shortcoming of Haskell in the
 > summary of Table 1 (Haskell gets the best results of all six languages
 > in that table, btw).  The main complaint cited by the authors is that
 > the representation type (fm in the example above) needs to be included
 > in the argument list of the class.  For larger examples, with more
 > associated types, such as the graph library studied in [5], this is
 > unwieldy.
 > 
 > Drawback #3:
 > ------------
 > The functional dependency encoding prevents you from ever making the
 > representation of your tries/maps (in the MapKey class example)
 > abstract.  It does not only prevent you from declaring it as an abstract
 > data type, but it actually forces users of a library built in this way
 > to understand the representation types and manually encode them in their
 > application code.  For example, assume we have an instance as follows
 > 
 >   instance MapKey Int (Data.IntMap v) where
 >     lookup k m = Data.IntMap.lookup k m 
 >     empty = Data.IntMap.empty
 > 
 > The full type of the class method lookup is
 > 
 >     lookup :: MapKey k fm => k -> fm v -> Maybe v
 > 
 > Now suppose a user wants to define a function
 > 
 >   lookupInt :: MapKey Int fm => k -> fm v -> Maybe v
 >   lookupInt = lookup
 > 
 > then they will find that the compiler rejects this definition.  (The
 > reason lies in when the type checker uses Mark Jones' improvement rule
 > [3].)
 > 
 > In fact, the only valid definition is
 > 
 >   lookupInt :: k -> Data.IntMap v -> Maybe v
 >   lookupInt = lookup
 > 
 > That's quite unsatisfactory, as a user of a such a tries library will
 > have to know how maps of Int keys are represented.  Moreover, if the
 > tries library ever changes that representation, all user code will have
 > to be changed.  This problem is aggravated by that representation types
 > for tries get quite complicated once you use more involved types as
 > keys.
 > 
 > This third drawback makes functional dependencies IMHO rather unsuitable
 > for encoding associated types in anything, but toy code.
 > 
 > Manuel
 > 
 > [1] Ralf Hinze, Johan Jeuring, and Andres Löh. Type-indexed data types. 
 >     Science of Computer Programming, MPC Special Issue, 51:117-151, 
 >     2004.
 >     http://www.informatik.uni-bonn.de/~ralf/publications/SCP2004.pdf
 > 
 > [2] Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and 
 >     Simon Marlow. Associated Types with Class.  POPL'05.
 >     http://www.cse.unsw.edu.au/~chak/papers/papers.html#assoc
 > 
 > [3] Mark P. Jones. Type Classes with Functional Dependencies.
 >     Proceedings of the 9th European Symposium on Programming Languages
 >     and Systems, LNCS 1782, 2000.
 >     http://www.cse.ogi.edu/~mpj/pubs/fundeps-esop2000.pdf
 > 
 > [4] Gregory J. Duck, Simon Peyton Jones, Peter J. Stuckey, and Martin 
 >     Sulzmann.  Sound and Decidable Type Inference for Functional 
 >     Dependencies.  ESOP'04.
 >     http://research.microsoft.com/Users/simonpj/Papers/fd-chr/
 > 
 > [5] Ronald Garcia, Jaakko Järvi, Andrew Lumsdaine, Jeremy G. Siek, and 
 >     Jeremiah Willcock.  A Comparative Study of Language Support for 
 >     Generic Programming. In Proceedings of the 2003 ACM SIGPLAN 
 >     conference on Object-oriented programming, systems, languages, and 
 >     applications (OOPSLA'03), 2003.
 >     http://www.osl.iu.edu/publications/pubs/2003/comparing_generic_programming03.pdf
 > 
 > 
 > _______________________________________________
 > Haskell mailing list
 > Haskell at haskell.org
 > http://www.haskell.org/mailman/listinfo/haskell