[Haskell-cafe] Abstraction in data types

Thu Mar 18 09:58:04 EDT 2010

Darrin Chandler wrote:
> data Point	= Cartesian (Cartesian_coord, Cartesian_coord)
> 		| Spherical (Latitude, Longitude)
> 
> type Center = Point
> type Radius = Float
> 
> data Shape	= Circle Center Radius
> 		| Polygon [Point]
> 
> This obviously stinks since a Polygon could contain mixed Cartesian and
> Spherical points. Polygon needs to be one or the other, but not mixed.

The problem is that you cannot distinguish in the type system whether a 
Point was created by the Cartesian or the Spherical constructor. These 
constructors have the following types:

   Cartesian :: (Cartesian_coord, Cartesion_coor) -> Point
   Spherical :: (Latitude, Latitude) -> Point

The types of both constructors end with the same type, the type Point. 
If you manage to change your code so that the constructors end in 
different types, then the difference between them will be visible to the 
type system, and you will have the static knowledge you want.

I know of two strategies to make the constructors end in different 
types. They could belong to different datatypes, or they could belong to 
different instances of a family of datatypes.

You could split the type Point into two types as follows.

   data Cartesian
     = Cartesian Cartesian_coord Cartesian_coord

   data Spherical
     = Spherical Latitude Longitude

Now the constructors have the following types:

   Cartesian :: Cartesian_coord -> Cartesian_coord -> Cartesian
   Spherical :: Latitude -> Longitude -> Spherical

Since the types of the constructors end in different types, we can 
distinguish values created with Cartesian from values created with 
Spherical in the type system.

(Note that I dropped the use of tuples from your types to make the 
example more idiomatic Haskell).

However, how are we going to write the Shape datatype now that we have 
two different types of points? We write a family of Shape datatypes 
parameterized in the type of points we want to use.

   data Shape point
     = Circle point Radius
     | Polygon [point]

Note that point is a type variable, which can be instantiated with 
whatever type we like, including Cartesian and Spherical. For example, 
Shape Cartesian is the type of shapes described with cartesian 
coordinates, and Shape Spherical is the type of shapes described with 
spherical coordinates. But we could also have a type like Shape Integer, 
which is the type of shapes described with Integer points, whatever that 
means.

This generality of the Shape family of types has advantages and 
disadvantages. An advantage is that we can make Shape an instance of the 
type class Functor as follows.

   instance Functor Shape where
     fmap f (Circle p r) = Circle (f p) r
     fmap f (Polygon ps) = Polygon (map f ps)

Note that fmap applies a function to every point in the shape, but does 
not change the overall shape structure. This could be used, for example, 
to convert between a spherical and a cartesian shape, given some 
function to convert between spherical and cartesian points.

   convertPoint :: Spherical -> Cartesian
   convertPoint = ...

   convertShape :: Shape Spherical -> Shape Cartesian
   convertShape = fmap convertPoint

Another example could be a shape with all the cartesian points stored in 
same database. Lets assume that we have a function to lookup which point 
is stored in the database under some key.

   lookupPoint :: Database -> Key -> Cartesian
   lookupPoint = ...

We can use fmap to write the function which takes a Shape with keys as 
points, and looks up all the keys in the database, returning a Shape of 
cartesian coordinates.

   lookupShape :: Database -> Shape Key -> Shape Cartesian
   lookupShape db = fmap (lookupPoint db)

The Functor typeclass is further extended and specialized by typeclasses 
like Traversable, Foldable, Applicative, Alternative, Monad, etc. 
Probably some of them are applicable here.

So an advantage of this kind of open family Shape is that we can write 
instances for standard typeclasses, and another advantage is that we may 
find good uses for types such as (Shape Key).

However, a disadvantage may be that we cannot restrict a priori which 
types of points are useable in our program. That also means that we 
cannot use pattern matching to discover at runtime which type of point 
was used in some shape.

If we make cartesian and spherical points belong to the same family of 
types, we relate them somewhat, but keep their distinction visible for 
the type system. We want to introduce a family of types with two 
members, one for spherical, and one for cartesian coordinates. We 
introduce empty data types to index the family.

   data Cartesian
   data Spherical

And we express Point as a GADT using Spherical and Cartesian as follows.

   data Point i
     = Cartesian :: Cartesian_coord -> Cartesian_coord -> Point Cartesian
     | Spherical :: Latitude -> Longitude -> Point Spherical

Again, the constructors have different types. The types differ in the 
type argument to Point. Since the type argument i is never used in the 
definition of Point to describe values, it is called a phantom type.

Now, we can use pattern matching to figure out the type of points used 
in some statically unknown shape. Lets again assume that we know how to 
convert a spherical into an cartesian point.

   convertPoint :: Point Spherical -> Point Cartesian
   convertPoint (Spherical latitude longitude) = ...

Note that we do not have to write a clause for Cartesian points, because 
a (Point Spherical) value could never have been created by the Cartesian 
constructor.

We can now write a normalizePoint function which converts spherical 
points into cartesian points, but which does not change already 
cartesian points.

   normalize :: Point i -> Point Cartesian
   normalize p@(Spherical _ _) = convertPoint p
   normalize p@(Cartesian _ _) = p

Note that after matching p with Spherical, we are allowed to call 
convertPoint, and after matching p with Cartesian, we are allowed to 
return p as a Point Cartesian. This is allowed by the type checker, 
because the type checker tracks which statically unknown information we 
rediscover at runtime.

We can use the same phantom types Cartesian and Spherical to write the 
Shape datatype.

   data Shape i
     = Circle (Point i) Radius
     | Polygon [Point i]

But we cannot make Shape an instance of Functor.

So if you want the type system to dinstiguish between values created by 
different constructors of the same datatype, you have to make the 
difference between the constructors visible to the type system. You 
could either split the datatype into a bunch of unrelated types, or you 
could rewrite the datatype into a family of datatypes. The most 
important difference between these approaches is whether you want to 
have an open or a closed family of types.

   Tillmann