[Haskell-cafe] Abstraction in data types

Thu Mar 18 12:45:37 EDT 2010

Thanks to those who responded. Solutions that didn't work for my
specific case still taught me more about expressing things in the
Haskell type system.

And... this particular response is extremely well written and useful.
You've made the issues involved very clear and understandable. I really
appreciate it. Thanks!

On Thu, Mar 18, 2010 at 02:58:04PM +0100, Tillmann Rendel wrote:
> Darrin Chandler wrote:
> >data Point	= Cartesian (Cartesian_coord, Cartesian_coord)
> >		| Spherical (Latitude, Longitude)
> >
> >type Center = Point
> >type Radius = Float
> >
> >data Shape	= Circle Center Radius
> >		| Polygon [Point]
> >
> >This obviously stinks since a Polygon could contain mixed Cartesian and
> >Spherical points. Polygon needs to be one or the other, but not mixed.
> 
> The problem is that you cannot distinguish in the type system
> whether a Point was created by the Cartesian or the Spherical
> constructor. These constructors have the following types:
> 
>   Cartesian :: (Cartesian_coord, Cartesion_coor) -> Point
>   Spherical :: (Latitude, Latitude) -> Point
> 
> The types of both constructors end with the same type, the type
> Point. If you manage to change your code so that the constructors
> end in different types, then the difference between them will be
> visible to the type system, and you will have the static knowledge
> you want.
> 
> I know of two strategies to make the constructors end in different
> types. They could belong to different datatypes, or they could
> belong to different instances of a family of datatypes.
> 
> You could split the type Point into two types as follows.
> 
>   data Cartesian
>     = Cartesian Cartesian_coord Cartesian_coord
> 
>   data Spherical
>     = Spherical Latitude Longitude
> 
> Now the constructors have the following types:
> 
>   Cartesian :: Cartesian_coord -> Cartesian_coord -> Cartesian
>   Spherical :: Latitude -> Longitude -> Spherical
> 
> Since the types of the constructors end in different types, we can
> distinguish values created with Cartesian from values created with
> Spherical in the type system.
> 
> (Note that I dropped the use of tuples from your types to make the
> example more idiomatic Haskell).
> 
> However, how are we going to write the Shape datatype now that we
> have two different types of points? We write a family of Shape
> datatypes parameterized in the type of points we want to use.
> 
>   data Shape point
>     = Circle point Radius
>     | Polygon [point]
> 
> Note that point is a type variable, which can be instantiated with
> whatever type we like, including Cartesian and Spherical. For
> example, Shape Cartesian is the type of shapes described with
> cartesian coordinates, and Shape Spherical is the type of shapes
> described with spherical coordinates. But we could also have a type
> like Shape Integer, which is the type of shapes described with
> Integer points, whatever that means.
> 
> This generality of the Shape family of types has advantages and
> disadvantages. An advantage is that we can make Shape an instance of
> the type class Functor as follows.
> 
>   instance Functor Shape where
>     fmap f (Circle p r) = Circle (f p) r
>     fmap f (Polygon ps) = Polygon (map f ps)
> 
> Note that fmap applies a function to every point in the shape, but
> does not change the overall shape structure. This could be used, for
> example, to convert between a spherical and a cartesian shape, given
> some function to convert between spherical and cartesian points.
> 
>   convertPoint :: Spherical -> Cartesian
>   convertPoint = ...
> 
>   convertShape :: Shape Spherical -> Shape Cartesian
>   convertShape = fmap convertPoint
> 
> Another example could be a shape with all the cartesian points
> stored in same database. Lets assume that we have a function to
> lookup which point is stored in the database under some key.
> 
>   lookupPoint :: Database -> Key -> Cartesian
>   lookupPoint = ...
> 
> We can use fmap to write the function which takes a Shape with keys
> as points, and looks up all the keys in the database, returning a
> Shape of cartesian coordinates.
> 
>   lookupShape :: Database -> Shape Key -> Shape Cartesian
>   lookupShape db = fmap (lookupPoint db)
> 
> The Functor typeclass is further extended and specialized by
> typeclasses like Traversable, Foldable, Applicative, Alternative,
> Monad, etc. Probably some of them are applicable here.
> 
> So an advantage of this kind of open family Shape is that we can
> write instances for standard typeclasses, and another advantage is
> that we may find good uses for types such as (Shape Key).
> 
> However, a disadvantage may be that we cannot restrict a priori
> which types of points are useable in our program. That also means
> that we cannot use pattern matching to discover at runtime which
> type of point was used in some shape.
> 
> If we make cartesian and spherical points belong to the same family
> of types, we relate them somewhat, but keep their distinction
> visible for the type system. We want to introduce a family of types
> with two members, one for spherical, and one for cartesian
> coordinates. We introduce empty data types to index the family.
> 
>   data Cartesian
>   data Spherical
> 
> And we express Point as a GADT using Spherical and Cartesian as follows.
> 
>   data Point i
>     = Cartesian :: Cartesian_coord -> Cartesian_coord -> Point Cartesian
>     | Spherical :: Latitude -> Longitude -> Point Spherical
> 
> Again, the constructors have different types. The types differ in
> the type argument to Point. Since the type argument i is never used
> in the definition of Point to describe values, it is called a
> phantom type.
> 
> Now, we can use pattern matching to figure out the type of points
> used in some statically unknown shape. Lets again assume that we
> know how to convert a spherical into an cartesian point.
> 
>   convertPoint :: Point Spherical -> Point Cartesian
>   convertPoint (Spherical latitude longitude) = ...
> 
> Note that we do not have to write a clause for Cartesian points,
> because a (Point Spherical) value could never have been created by
> the Cartesian constructor.
> 
> We can now write a normalizePoint function which converts spherical
> points into cartesian points, but which does not change already
> cartesian points.
> 
>   normalize :: Point i -> Point Cartesian
>   normalize p@(Spherical _ _) = convertPoint p
>   normalize p@(Cartesian _ _) = p
> 
> Note that after matching p with Spherical, we are allowed to call
> convertPoint, and after matching p with Cartesian, we are allowed to
> return p as a Point Cartesian. This is allowed by the type checker,
> because the type checker tracks which statically unknown information
> we rediscover at runtime.
> 
> We can use the same phantom types Cartesian and Spherical to write
> the Shape datatype.
> 
>   data Shape i
>     = Circle (Point i) Radius
>     | Polygon [Point i]
> 
> But we cannot make Shape an instance of Functor.
> 
> So if you want the type system to dinstiguish between values created
> by different constructors of the same datatype, you have to make the
> difference between the constructors visible to the type system. You
> could either split the datatype into a bunch of unrelated types, or
> you could rewrite the datatype into a family of datatypes. The most
> important difference between these approaches is whether you want to
> have an open or a closed family of types.
> 
>   Tillmann
> 

-- 
Darrin Chandler            |  Phoenix BSD User Group  |  MetaBUG
dwchandler at stilyagin.com   |  http://phxbug.org/      |  http://metabug.org/
http://www.stilyagin.com/  |  Daemons in the Desert   |  Global BUG Federation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20100318/ae42c384/attachment.bin