[Haskell-cafe] Abstraction in data types
Tillmann Rendel
rendel at informatik.uni-marburg.de
Thu Mar 18 09:58:04 EDT 2010
Darrin Chandler wrote:
> data Point = Cartesian (Cartesian_coord, Cartesian_coord)
> | Spherical (Latitude, Longitude)
>
> type Center = Point
> type Radius = Float
>
> data Shape = Circle Center Radius
> | Polygon [Point]
>
> This obviously stinks since a Polygon could contain mixed Cartesian and
> Spherical points. Polygon needs to be one or the other, but not mixed.
The problem is that you cannot distinguish in the type system whether a
Point was created by the Cartesian or the Spherical constructor. These
constructors have the following types:
Cartesian :: (Cartesian_coord, Cartesion_coor) -> Point
Spherical :: (Latitude, Latitude) -> Point
The types of both constructors end with the same type, the type Point.
If you manage to change your code so that the constructors end in
different types, then the difference between them will be visible to the
type system, and you will have the static knowledge you want.
I know of two strategies to make the constructors end in different
types. They could belong to different datatypes, or they could belong to
different instances of a family of datatypes.
You could split the type Point into two types as follows.
data Cartesian
= Cartesian Cartesian_coord Cartesian_coord
data Spherical
= Spherical Latitude Longitude
Now the constructors have the following types:
Cartesian :: Cartesian_coord -> Cartesian_coord -> Cartesian
Spherical :: Latitude -> Longitude -> Spherical
Since the types of the constructors end in different types, we can
distinguish values created with Cartesian from values created with
Spherical in the type system.
(Note that I dropped the use of tuples from your types to make the
example more idiomatic Haskell).
However, how are we going to write the Shape datatype now that we have
two different types of points? We write a family of Shape datatypes
parameterized in the type of points we want to use.
data Shape point
= Circle point Radius
| Polygon [point]
Note that point is a type variable, which can be instantiated with
whatever type we like, including Cartesian and Spherical. For example,
Shape Cartesian is the type of shapes described with cartesian
coordinates, and Shape Spherical is the type of shapes described with
spherical coordinates. But we could also have a type like Shape Integer,
which is the type of shapes described with Integer points, whatever that
means.
This generality of the Shape family of types has advantages and
disadvantages. An advantage is that we can make Shape an instance of the
type class Functor as follows.
instance Functor Shape where
fmap f (Circle p r) = Circle (f p) r
fmap f (Polygon ps) = Polygon (map f ps)
Note that fmap applies a function to every point in the shape, but does
not change the overall shape structure. This could be used, for example,
to convert between a spherical and a cartesian shape, given some
function to convert between spherical and cartesian points.
convertPoint :: Spherical -> Cartesian
convertPoint = ...
convertShape :: Shape Spherical -> Shape Cartesian
convertShape = fmap convertPoint
Another example could be a shape with all the cartesian points stored in
same database. Lets assume that we have a function to lookup which point
is stored in the database under some key.
lookupPoint :: Database -> Key -> Cartesian
lookupPoint = ...
We can use fmap to write the function which takes a Shape with keys as
points, and looks up all the keys in the database, returning a Shape of
cartesian coordinates.
lookupShape :: Database -> Shape Key -> Shape Cartesian
lookupShape db = fmap (lookupPoint db)
The Functor typeclass is further extended and specialized by typeclasses
like Traversable, Foldable, Applicative, Alternative, Monad, etc.
Probably some of them are applicable here.
So an advantage of this kind of open family Shape is that we can write
instances for standard typeclasses, and another advantage is that we may
find good uses for types such as (Shape Key).
However, a disadvantage may be that we cannot restrict a priori which
types of points are useable in our program. That also means that we
cannot use pattern matching to discover at runtime which type of point
was used in some shape.
If we make cartesian and spherical points belong to the same family of
types, we relate them somewhat, but keep their distinction visible for
the type system. We want to introduce a family of types with two
members, one for spherical, and one for cartesian coordinates. We
introduce empty data types to index the family.
data Cartesian
data Spherical
And we express Point as a GADT using Spherical and Cartesian as follows.
data Point i
= Cartesian :: Cartesian_coord -> Cartesian_coord -> Point Cartesian
| Spherical :: Latitude -> Longitude -> Point Spherical
Again, the constructors have different types. The types differ in the
type argument to Point. Since the type argument i is never used in the
definition of Point to describe values, it is called a phantom type.
Now, we can use pattern matching to figure out the type of points used
in some statically unknown shape. Lets again assume that we know how to
convert a spherical into an cartesian point.
convertPoint :: Point Spherical -> Point Cartesian
convertPoint (Spherical latitude longitude) = ...
Note that we do not have to write a clause for Cartesian points, because
a (Point Spherical) value could never have been created by the Cartesian
constructor.
We can now write a normalizePoint function which converts spherical
points into cartesian points, but which does not change already
cartesian points.
normalize :: Point i -> Point Cartesian
normalize p@(Spherical _ _) = convertPoint p
normalize p@(Cartesian _ _) = p
Note that after matching p with Spherical, we are allowed to call
convertPoint, and after matching p with Cartesian, we are allowed to
return p as a Point Cartesian. This is allowed by the type checker,
because the type checker tracks which statically unknown information we
rediscover at runtime.
We can use the same phantom types Cartesian and Spherical to write the
Shape datatype.
data Shape i
= Circle (Point i) Radius
| Polygon [Point i]
But we cannot make Shape an instance of Functor.
So if you want the type system to dinstiguish between values created by
different constructors of the same datatype, you have to make the
difference between the constructors visible to the type system. You
could either split the datatype into a bunch of unrelated types, or you
could rewrite the datatype into a family of datatypes. The most
important difference between these approaches is whether you want to
have an open or a closed family of types.
Tillmann
More information about the Haskell-Cafe
mailing list