Abstract Collections vs. FunDeps

Fri Apr 2 11:21:05 EST 2004

hi,

On Fri, 26 Mar 2004, Robert Will wrote:
>
> Honest reply: I don't fully understand FunDeps and I don't want to learn

Well, in fact I couldn't resist the temptation to look it at, only to find
that they are really simpler than my earlier information suggested.  I
also made two small observations (a) that non-circular FunDeps with a
single variable on the left side, can be one-to-one emulated via
Constructor classes, and (b) parametric type classes correspond exactly to
FunDeps with only one constraint.  (Both of the special cases occur rather
often.)  Anyway it consolidates my opinion regarding the Abstract
Collections and FunDeps: no need to make a redesign.  (Full text at the
bottom.)

The only thing that bothers me at the moment, is that I couldn't find the
"many other applications of FunDeps" that are mentioned but not given in
Jones' paper.  Perhaps one should make a list.

On Fri, 26 Mar 2004, Dylan Thurston wrote:
>
> Also, they're vitally necessary for certain algorithms and for speed
> (since not all ways of making containers can contain all items).

If you mean restricting some Containers to certain element types (Patricia
trees and so on), this is perfectly possible without FunDeps.  If you
don't mean that, I don't understand it (even with my new knowledge of
FunDeps ;-).

Robert

--------------------
250 lines

In section 0, I explain FunDeps for people like me.  Section 1 shows that
a certain kind of FunDeps can always be expressed with Constructor
classes.  Section 2 compares those approaches and Section 3 makes the
relationship between FundDeps and Parametric Type Classes more precise.
(Although I don't consider this relevant in practice.)  Finally I draw
some conclusions for the Abstract Collections and further research.

I'm referring to FunDeps as described in
   Mark P. Jones: "Type Classes with Functional Dependencies"
   Proc. ESOP 2000, Berlin, Springer-Verlag LNCS 1782.

People wondering about why I need to formulate things different to
consider them correct, may read about mental universes in
   Stefan Kahrs, 1999: A formalist's perspective on mathematics
   http://www.cs.ukc.ac.uk/people/staff/smk/manifesto.ps

0. FunDeps are restrictions on the set of possible instance
   declarations.

We consider Jones' way to define a class for collection types:

class Collects e ce
   where
-- empty  :: ce              -- ambiguous
   single :: e  -> ce
-- (<+>)  :: ce -> ce -> ce  -- ambiguous

   add    :: ce -> e  -> ce
   first  :: ce -> e

And two examples: a) Intuitively the following binding should be
ill-typed, but instead it gets a type:

>  g :: (Collects String ce, Collects Int ce) => ce
>  g = (single "Jones") `add` 1

b) And the following should have type "Int" instead:

>  f :: (Collects Int ce, Collects e ce) => e
>  f = first (single 1)

Indeed, if we had some multi-collection type "MultiColl" with
>  instance Collects String MultiColl where ...
>  instance Collects Int    MultiColl where ...

Then the above types could be specialised to
g :: MultiColl
f :: String

Given that MultiColl is easily implementable in Haskell (as a tuple of
collections of String, Int, and possibly others; or as a Collections
of a sum type...), we must admit that the above polymorphic types are
indeed correct.

But what has made us assume the contrary?  Well, obviously we thought
that there would never be any type that could have both instances of
Collects.  More generally, we would assume that any type 't' will only
have one single instance of Collects: one single element type, for
each collection type.  If the compiler would know that, too, he could
simplify the above type expressions to yield an error for 'g' (since
no 'ce' can comoply to both constraints) and "Int" for 'f' (since 'b'
and 'Int' as element types of the same collection type must be the
same).

The relation to relational data bases:  A data base query like "give
me the name and address of the patient with social security number
(SSN) such_and_such," is ambigous in the general case: there might be
more than one patient with SSN such_and_such.  The solution is to
declare a functional dependency: "SSN -> (name, address)" which is
just a declaration that for each SSN there is at most one 'name' and
'address' in all entries where that SSN appears.  (There might be
different such entries, e.g. if the SSN is connected with multiple
insurance cases.)

In our problem, we can do just the same: In a class declaration "class
C t" (where 't' are the type variables of the class) a Functional
Dependency is written "a -> b" (where 'a' and 'b' are subsets of 't').
It's semantics (modulo polymorphic instances) is:

        A FunDep "a -> b" restricts the set of legal instance
	declarations such that for every pair of instances where the
	variables 'a' are replaced by the same types, 'b' must also be
	the same types.

As a consequence, if 'a' and 'b' are the only type variables of a
class, this means that there can be at most one instance declaration
for any substitution for 'a'.  Since instances can also contain type
variables (they can be polymorphic), the characterisation "same
substitution of types" is not general enough.  But we can "lift" the
above definition if we consider any polymorphic instance declaration
to represent the (possibly infinite) set of all its non-polymorphic
substitution instances.  Then, for example the declaration

> instance Collects x [y]

Stands in place for "instance Collects Int [Int]", "instance Collects
Int [Bool]" and so on.  This clearly violates the restriction.  In
section 6.1. of his paper Jones formulates the rectriction using
unification, so that polymorphic instances are covered, and in section
6.2. he derives simplification rules for the constraints, which makes
disappear ambiguous variables (because they are unified with others).

1. Left-Single FunDeps are Syntactic Sugar for Constructor Classes

Haskell distinguishes between types (like "Int", "a -> b") and type
constructors (like [], (->)).  Unlike in the value-world, where
functions and data constructors represent "normal" values, type
constructors are not normal types: they can be used as parameters to
other type constructors, but in a legal type, all constructors must be
completely applied.  AFAICS, in some earlier version of Haskell the
parameter(s) of a type class could only represent a type, no
partially- or unapplied type constructors.  Type classes that work
with the latter later came as an extension and are called "Constructor
Classes".

For simple FunDeps of the form "a -> b" where 'a' and 'b' are single
variables, not set of variables, we can do a simple transformation: in
the declaration "class C a b ... where" we replace 'a' with 't' (the
variable stands for an unary type constructor) and in the class
context (the part before the "=>") and the body (the declaration of
the member functions) we replace each 'a' with 't b'.  Then we can
leave out the FunDep and still have exactly the same semantics.  (Note
that having non-variable constructors in the class context is a
language extension, even a dangerous one, but this is not required if
'a' doesn't appear in the context, or if we can change the superclass
in the same way.)

This transformation can be done to eliminate all FunDeps that

 (1) have only one variable on the left hand side (but possibly more
     on the right hand side), and that

 (2) are not cyclic, i.e., there's no dependency chain like
     a -> b, b -> c, c -> .. a.

If we think of constructor classes as the "more basic" construct
(which is reasonable, since their semantic rules are not much
different from ordinary type classes), we can thus consider
left-single non-cyclic FunDeps as syntactic sugar for the use of
constructor classes.

2. Comparison

Here is a small example that compares the FunDep and the Constructor
Class version of a type:

>From the Abstract Collections:
(read (<:) as 'add_first', formerly known as 'cons')
>  (1<:) :: ( Sequence seq Int
>           ) => seq Int -> seq Int

using FunDeps this would be:
>  (1<:) :: ( Sequence seq Int
>           ) => seq -> seq

As you can see the first version is more informative, saying
explicitly "seq Int".  Of course, in the FunDep version we could
rename 'seq' to 'seq_Int' to make it more readable.  But still the
constructor version provides us automatically with that structuring:
every collection type consists of a collection constructor and an
element type, just like if we would use concrete types: "seq Int" vs
"Seq Int" (or "(WeightBalanced LeafTree) Int", which would be
"WeightBalanced (LeafTree Int)", with nullary collection types, by the
way -- AFAICS without having tried, Dessy's building set approach
would also work with nullary Collection classes with just this change
in parentheses).

The following example reveals a key point ('apply' being the function
formerly known as 'map'):
>  apply :: ( Collection coll a, Collection coll b
>           ) => (a -> b) -> coll a -> coll b

with FunDeps:
>  apply :: ( Collects a ca, Collection b cb
>           ) => (a -> b) -> ca -> cb

The first type expresses that the type constructor of 'apply''s
argument and result must be the same, while the nullary version can't
express that.  (This is not tooooo important in the abstract
collections, perhaps even a bit questionable, but (a) it helps us to
avoid intermediate ambigous types, (even with FundDeps 'cb' wouldn't
be unified with anything more concrete) and (b) it also helps make
simpler implementations, since 'apply' can be lifted to use the
representation of higher-order type constructors, such as
'WeightBalanced'.)

Short summary: Constructor classes give slightly better documentation
and are slightly more expressive.

3. Parametric Type Classes are Normalised FunDeps

Here is another small observation: in relational data bases we want to
transform our tables in such a way, that are as little as possible
dependencies (since they incur redundancy).  Thus we bring them into a
normal form, with only one functional constraint of the form "a -> b"
(here 'a' and 'b' denote sets (of columns) again).  The 'a' part is
called the "key" of an entry in the data base.

Classes with only one FunDep are necessarily in normal form.  I don't
want to consider transforming class declarations to normal form (no
idea what that would mean...), just this: if we use a parametric type
class of the form "class a \elem C b where" ('a' and 'b' sets of
variables) then this corresponds just to a FunDep in normal form.
Non-normal-form FunDeps can't be expressed with Parametric Type
Classes.

4. Conclusions

Two conclusions for further research:

 1. Everytime one advocates FunDeps, one should mention whether the
    application needs left-multiple FunDeps, or whether left-single
    FunDeps suffice.

 2. Everytime one advocates left-single FunDeps, one should compare
    the solution with Constructor Classes and one should consider all
    the different possibilities to encode the FunDeps with constructor
    classes.  Which one is the most simple, most intuitive?  Which one
    allows simpler formal reasoning, simpler implementation?

Three conclusions for the Abstract Collections:

 1. Since they are currently build on Constructor Classes, their
    current design can be expressed with left-single FunDeps only, I
    don't see any application for left-multiple FunDeps.  (ByMaps have
    to be examined yet!)

 2. Structures that only work on a certain element-type
    (e.g. Patricia- Trees) must nevertheless have a type- parameter.
    This may be counterintuitive and someone here claimed that it
    leads to that late detection of type errors, but this approach
    does at least appear very simple in my intuition: every Collection
    type is made up of an unary type constructor and an element type.
    The fact, that certain concrete data structures only work on
    certain element types can conveniently be expressed in the
    instance declaration:

>   instace Collection Patricia Int where ...

    Furthermore, we use newtypes anyway to protect the invariants of
    our structure implementations, so this is not a problem, either.

 3. If we should nevertheless decide some day, that FunDeps are The
    Better Thing, we know that the current design of the Collections
    has a straight-forward translation to FunDeps.  If a transition
    has to be made some day, this can happen semi-automatically
    without the need to rethink or redesign things.  Neither for
    implementors of data structures, nor for their users.