[Haskell-cafe] Abstraction in data types

Thu Mar 18 10:10:04 EDT 2010

I was just reading through the discussion, and Tillmann, your reply is one
of the best written descriptions I've ever seen here. (or even in any other
mail list!)

Of course, I see many good replies here, but they almost always turn out to
be irrelevant to the original question. Yours on the other hand directly
answers the original question.

Just wanted to thank.

-- Ozgur

On 18 March 2010 13:58, Tillmann Rendel <rendel at informatik.uni-marburg.de>wrote:

> Darrin Chandler wrote:
>
>> data Point      = Cartesian (Cartesian_coord, Cartesian_coord)
>>                | Spherical (Latitude, Longitude)
>>
>> type Center = Point
>> type Radius = Float
>>
>> data Shape      = Circle Center Radius
>>                | Polygon [Point]
>>
>> This obviously stinks since a Polygon could contain mixed Cartesian and
>> Spherical points. Polygon needs to be one or the other, but not mixed.
>>
>
> The problem is that you cannot distinguish in the type system whether a
> Point was created by the Cartesian or the Spherical constructor. These
> constructors have the following types:
>
>  Cartesian :: (Cartesian_coord, Cartesion_coor) -> Point
>  Spherical :: (Latitude, Latitude) -> Point
>
> The types of both constructors end with the same type, the type Point. If
> you manage to change your code so that the constructors end in different
> types, then the difference between them will be visible to the type system,
> and you will have the static knowledge you want.
>
> I know of two strategies to make the constructors end in different types.
> They could belong to different datatypes, or they could belong to different
> instances of a family of datatypes.
>
> You could split the type Point into two types as follows.
>
>  data Cartesian
>
>    = Cartesian Cartesian_coord Cartesian_coord
>
>  data Spherical
>    = Spherical Latitude Longitude
>
> Now the constructors have the following types:
>
>  Cartesian :: Cartesian_coord -> Cartesian_coord -> Cartesian
>  Spherical :: Latitude -> Longitude -> Spherical
>
> Since the types of the constructors end in different types, we can
> distinguish values created with Cartesian from values created with Spherical
> in the type system.
>
> (Note that I dropped the use of tuples from your types to make the example
> more idiomatic Haskell).
>
> However, how are we going to write the Shape datatype now that we have two
> different types of points? We write a family of Shape datatypes
> parameterized in the type of points we want to use.
>
>  data Shape point
>    = Circle point Radius
>    | Polygon [point]
>
> Note that point is a type variable, which can be instantiated with whatever
> type we like, including Cartesian and Spherical. For example, Shape
> Cartesian is the type of shapes described with cartesian coordinates, and
> Shape Spherical is the type of shapes described with spherical coordinates.
> But we could also have a type like Shape Integer, which is the type of
> shapes described with Integer points, whatever that means.
>
> This generality of the Shape family of types has advantages and
> disadvantages. An advantage is that we can make Shape an instance of the
> type class Functor as follows.
>
>  instance Functor Shape where
>    fmap f (Circle p r) = Circle (f p) r
>    fmap f (Polygon ps) = Polygon (map f ps)
>
> Note that fmap applies a function to every point in the shape, but does not
> change the overall shape structure. This could be used, for example, to
> convert between a spherical and a cartesian shape, given some function to
> convert between spherical and cartesian points.
>
>  convertPoint :: Spherical -> Cartesian
>  convertPoint = ...
>
>  convertShape :: Shape Spherical -> Shape Cartesian
>  convertShape = fmap convertPoint
>
> Another example could be a shape with all the cartesian points stored in
> same database. Lets assume that we have a function to lookup which point is
> stored in the database under some key.
>
>  lookupPoint :: Database -> Key -> Cartesian
>  lookupPoint = ...
>
> We can use fmap to write the function which takes a Shape with keys as
> points, and looks up all the keys in the database, returning a Shape of
> cartesian coordinates.
>
>  lookupShape :: Database -> Shape Key -> Shape Cartesian
>  lookupShape db = fmap (lookupPoint db)
>
> The Functor typeclass is further extended and specialized by typeclasses
> like Traversable, Foldable, Applicative, Alternative, Monad, etc. Probably
> some of them are applicable here.
>
> So an advantage of this kind of open family Shape is that we can write
> instances for standard typeclasses, and another advantage is that we may
> find good uses for types such as (Shape Key).
>
> However, a disadvantage may be that we cannot restrict a priori which types
> of points are useable in our program. That also means that we cannot use
> pattern matching to discover at runtime which type of point was used in some
> shape.
>
> If we make cartesian and spherical points belong to the same family of
> types, we relate them somewhat, but keep their distinction visible for the
> type system. We want to introduce a family of types with two members, one
> for spherical, and one for cartesian coordinates. We introduce empty data
> types to index the family.
>
>  data Cartesian
>  data Spherical
>
> And we express Point as a GADT using Spherical and Cartesian as follows.
>
>  data Point i
>    = Cartesian :: Cartesian_coord -> Cartesian_coord -> Point Cartesian
>    | Spherical :: Latitude -> Longitude -> Point Spherical
>
> Again, the constructors have different types. The types differ in the type
> argument to Point. Since the type argument i is never used in the definition
> of Point to describe values, it is called a phantom type.
>
> Now, we can use pattern matching to figure out the type of points used in
> some statically unknown shape. Lets again assume that we know how to convert
> a spherical into an cartesian point.
>
>  convertPoint :: Point Spherical -> Point Cartesian
>  convertPoint (Spherical latitude longitude) = ...
>
> Note that we do not have to write a clause for Cartesian points, because a
> (Point Spherical) value could never have been created by the Cartesian
> constructor.
>
> We can now write a normalizePoint function which converts spherical points
> into cartesian points, but which does not change already cartesian points.
>
>  normalize :: Point i -> Point Cartesian
>  normalize p@(Spherical _ _) = convertPoint p
>  normalize p@(Cartesian _ _) = p
>
> Note that after matching p with Spherical, we are allowed to call
> convertPoint, and after matching p with Cartesian, we are allowed to return
> p as a Point Cartesian. This is allowed by the type checker, because the
> type checker tracks which statically unknown information we rediscover at
> runtime.
>
> We can use the same phantom types Cartesian and Spherical to write the
> Shape datatype.
>
>  data Shape i
>    = Circle (Point i) Radius
>    | Polygon [Point i]
>
> But we cannot make Shape an instance of Functor.
>
> So if you want the type system to dinstiguish between values created by
> different constructors of the same datatype, you have to make the difference
> between the constructors visible to the type system. You could either split
> the datatype into a bunch of unrelated types, or you could rewrite the
> datatype into a family of datatypes. The most important difference between
> these approaches is whether you want to have an open or a closed family of
> types.
>
>  Tillmann
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>

-- 
Ozgur Akgun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100318/b6837951/attachment.html