[Haskell] extensible records using associated types

Barney Hilken b.hilken at ntlworld.com
Mon Jun 19 18:40:11 EDT 2006


I'm sorry if this example has already been posted, but I couldn't  
find it by searching the archives. If it's new (and works properly),  
it provides more evidence that ATs are a good thing.


You can use associated type synonyms to implement polymorphic  
extensible records.
------------------------------------------------------------------------ 
---------

We will represent records as values of the form
	N1 x1 (N2 x2 ... (Nn xn Empty)...)
with corresponding type
	N1 a1 (N2 a2 ... (Nn an Empty)...)
where the field name N is defined by the declaration
	data N a r = N a r

For convenience, we define the type of field name constructors:

 >	type Constructor n = forall a. forall r. a -> r -> n a r

so that N :: Constructor N for each field name N.

Next we define a class (:<:) whose purpose is to tell the compiler in  
which order to store fields:

 >	class (m :: * -> * -> *) :<: (n :: * -> * -> *)

The idea is that if N and M are distinct field names which might  
appear the same record, then there must be an instance of exactly one  
of N :<: M and M :<: N.

The class Contains n r asserts that r is a record type with a field  
labelled n:

 >	class Contains n r where				
 >		type Project n r					
 >		type Delete n r						
 >		project :: Constructor n -> r -> Project n r
 >		delete :: Constructor n -> r -> Delete n r

There are two associated types: Project n r is the type of the n  
field, and Delete n r is the type of the rest of the record. The  
functions project and delete give the corresponding values.

The class Lacks n r asserts that r is a record type without a field  
labelled n:

 >	class Lacks n r where							
 >		type Extend n a r							
 >		extend :: Constructor n -> a -> r -> Extend n a r

There is one associated type: Extend n a r is the type of the record  
obtained by adding a field of type a labelled n. Note that according  
to the AT paper, the parameter a should come last, but this order is  
much more convenient. The function extend adds the new field to the  
record.

The class Disjoint r s asserts that r and s are record types with no  
fields in common:

 >	class Disjoint r s where		
 >		type Union r s				
 >		union :: r -> s -> Union r s

There is one associated type: Union r s is the type of the merged  
record, and union is the merge function.


Next we define the empty record:

 >	data Empty = Empty

It has no fields, so has no instances of Contains, but lots of  
instances of Lacks:

 >	instance Lacks n Empty where		
 >		type Extend n a Empty = n a Empty
 >		extend nn x Empty = nn x Empty	

It is also disjoint from everything:

 >	instance Disjoint Empty r where
 >		type Union Empty r = r		
 >		union Empty t = t			


Each field name must be defined as a data type together with  
instances of the classes Contains, Lacks and Disjoint as follows:

 >	data N a r = N a r

The type N a r certainly contains a field named N:

 >	instance Contains N (N a r) where
 >		type Project N (N a r) = a	
 >		type Delete N (N a r) = r	
 >		project N (N x t) = x		
 >		delete N (N x t) = t		

It also Contains all the fields of r, though you only need to check  
names which are below N:

 >	instance m :<: N, Contains m r => Contains m (N a r) where
 >		type Project m (N a r) = Project m r			
 >		type Delete m (N a r) = N a (Delete m r)	
 >		project mm (N x t) = project mm t			
 >		delete mm (N x t) = N x (delete mm t)	

Similarly, it Lacks all the fields which r Lacks, except for N  
itself. This is where we really need the ordering :<: to ensure that  
m is not equal to N. There are two cases, m :<: N:

 >	instance m :<: N, Lacks m r => Lacks m (N a r) where
 >		type Extend m b (N a r) = N a (Extend m b r)	
 >		extend mm y (N x t) = N x (extend mm y t)		

and N :<: m:

 >	instance N :<: m => Lacks m (N a r) where
 >		type Extend m b (N a r) = m b (N a r)
 >		extend mm y (N x t) = mm y (N x t)	

To ensure that N a r is disjoint from s, it is enough that r is  
disjoint from s and N is not in their union:

 >	instance Disjoint r s, Lacks N (Union r s) => Disjoint (N a r) s  
where
 >		type Union (N a r) s = Extend N a (Union r s)
 >		union (N x t) u = extend N x (union t u)


Finally, we must maintain the ordering :<:. It has to be a global  
linear order, so if we know the last field name was M, we can define:

 >	instance M :<: N
 >	instance m :<: M => m :<: N


Usage.
------

The record {N1 = x1, ... Nn = xn} should be constructed as (extend N1  
x1 $ ... $ extend Nn xn) Empty.
Extension, projection, deletion and union are all as polymorphic as  
you could want, and you can even use field names as ordinary values.  
For example, you can define update by:

 >	update :: Contains n r, Lacks n (Delete n r) => Constructor n -> a  
-> r -> Extend n a (Delete n r)
 >	update n y t = extend n y (delete n t)


Problems.
---------

1. Pattern matching.
    Pattern matching on complete records is possible (though you must  
get the fields in the right order!) but there is no equivalent to the  
trex .. notation for 'any other fields'.

2. Maintaining the :<: order.
    Within one module this could be generated with a template, but  
between modules this is more of a problem. Of course, we could use a  
built-in class based on alphabetical order, but this would need  
compiler support.

3. Controlling the use of constructors.
    If you use N to construct records and types directly, instead of  
Extend N and extend N, you get fields in the wrong order and  
everything breaks. The type system can't stop you doing this.

4. Testing.
    I don't actually know if this works, because phrac won't compile  
it. We really need the second-order polymorphism of field  
constructors, and phrac doesn't have forall. (There is also the  
trivial point of the order of arguments to Extend).


Barney.




More information about the Haskell mailing list