safe coerce: four methods compared

Fri, 1 Aug 2003 22:27:27 -0700 (PDT)

This is a "Related Work" section of the previous message. We compare
three main methods of achieving safe casts. It seems that the method
proposed in the earlier message is quite different -- especially in
terms of extensibility. In this message, we compare the extensibility
of four techniques. Stephanie Weirich ICFP'00 paper points out another
solution, which relies on mutable IORefs. Since that technique can
only be used with IO monad, we do not consider it here.

Some of the methods below require type classes and algebraic datatype
declarations. Some require only an algebraic datatype, or only a
typeclass. In either case, we run into an extensibility problem: to
add support for a new datatype, we must either add an instance
declaration, or a new alternative to the datatype declaration. These
are non-trivial, non-modular extensions. For example, when we add a
new alternative to a datatype declaration, we must physically update
the corresponding file. We must then re-compile all dependent modules.

Surprisingly, the solution in the earlier message is free from these
drawbacks. We can extend the type heap in a modular fashion. We do not
need to alter type or data declarations. It seems our type heaps are
sort of reified lists of instances. There appear to be a duality
between typeclasses and our type heaps. Only our type heaps are
first-class.

The idea behind all type-safe casts is simple: to cast a value of a type
'a' into a target type 'b', we inject the value into some universe and
then project it to the target type 'b'.

To illustrate the differences in implementations of that idea, we will
be using James Cheney and Ralf Hinze's example: a generic comparison
function. To avoid unnecessary complications, we limit ourselves to
built-in and scalar types. Extensions to products, exponential,
recursive and polymorphic types are possible, but too messy. We also
will not consider existential types, so we avoid introducing type
classes if they are not needed in the static case.

This whole message is self-contained, and can be loaded as it is in
GHCi, given the flags
  -fglasgow-exts -fallow-undecidable-instances -fallow-overlapping-instances

Approach 1: Tcl approach

The universal type is a string. The values to cast must belong to the
class Show and the class Read. The injection and projection functions
are trivial:

> sh_inj x = show x
> sh_prj x = read x

Generic equality and the cast functions are trivial as well:

> sh_gequal x y = sh_inj x == sh_inj y

> sh_cast x = sh_prj $ sh_inj x

Here's the test:

> sh_test1 = [sh_gequal 1 2, sh_gequal True True, sh_gequal 'a' 'b']

To add support for a new datatype, we have to place that datatype in
the class Show and the class Read. That is, we have to add the
corresponding instance declarations _and_ we have to implement methods
'show' and 'read'. Functions sh_gequal and sh_cast do not have to be
modified. The Tcl approach is also the most generous with respect to
type equivalence: for example, Int and Integer are considered
equivalent, and sh_cast may cast between them.

Incidentally, when GHC can derive Binary, this approach becomes far
more appealing.

Approach 2: The universe is the tagged union

> data TU = TChar Char | TBool Bool | TInt Int
>
> class TURepr t where
>   tu_inj:: t -> TU
>   tu_prj:: TU -> t
> 
> instance TURepr Char where
>   tu_inj = TChar
>   tu_prj (TChar x) = x
>  
> instance TURepr Bool where
>   tu_inj = TBool
>   tu_prj (TBool x) = x
> 
> instance TURepr Int where
>   tu_inj = TInt
>   tu_prj (TInt x) = x
> 
> tu_gequal x y = cmp (tu_inj x) (tu_inj y)
>   where
>     cmp (TChar x) (TChar y) = x == y       
>     cmp (TBool x) (TBool y) = x == y       
>     cmp (TInt x)  (TInt y)  = x == y       
>
> tu_cast x = tu_prj $ tu_inj x
>
> tu_test1 = [tu_gequal (1::Int) (2::Int), tu_gequal True True, 
>             tu_gequal 'a' 'b']

To add support for a new datatype, we have to add a new alternative to
the declaration of the datatype TU and we have to add a new instance
for the class TURepr (with the implementation of the tu_inj and
tu_proj methods). We also have to add another clause to the tu_gequal
function. Clearly this is the least extensible approach.

Approach 3: by Cheney and Ralf Hinze's
The universe is a set of inject/project pairs

> data IPP a =  IPPInt  (a->Int)  (Int->a) 
>             | IPPChar (a->Char) (Char->a)
> 	      | IPPBool (a->Bool) (Bool->a)
>	    
> ipp_gequal (IPPInt  prj inj) x y = prj x == prj y
> ipp_gequal (IPPChar prj inj) x y = prj x == prj y
> ipp_gequal (IPPBool prj inj) x y = prj x == prj y
>
> ipp_cast (IPPInt xprj xinj) x (IPPInt yprj yinj) = yinj $ xprj x
> -- more should follow...
>
> ipp_test1 = [ipp_gequal (IPPInt  id id) (1::Int) (2::Int),
>              ipp_gequal (IPPBool id id) True True,
> 	       ipp_gequal (IPPChar id id) 'a' 'b']

To add a new primitive datatype, we should modify the declaration of
the datatype IPP and add a new alternative. We also need to add
clauses to ipp_gequal and ipp_cast.

Incidentally, (IPPInt  id id) specifies that the type Int is
equivalent only to Int (that is, for type Int equality is the same as
identity). However, we can be more generous: for example, we can cast
between any enumerable type and Int:

*> *Main> ipp_cast (IPPInt fromEnum toEnum) () (IPPInt id id)
*> 0
*> *Main> ipp_cast (IPPInt fromEnum toEnum) True (IPPInt id id)
*> 1

Approach 4: the approach of an earlier message.
We can have as many universes as we wish. For historical reasons,
injection is called 'alter' and projection 'fetch'.

> data Nil t r  = Nil
> data Cons t r = Cons t r
>
> class PList ntype vtype cdrtype where
>     cdr::   ntype vtype cdrtype -> cdrtype
>     empty:: ntype vtype cdrtype -> Bool
>     value:: ntype vtype cdrtype -> vtype
>   
> instance PList Nil vtype cdrtype where
>     empty = const True
>
> instance (PList n v r) => PList Cons v' (n v r) where
>      empty = const False
>      value (Cons v r) = v
>      cdr   (Cons v r) = r
>
> class TypeSeq t s where
>     type_index:: t -> s -> Int
>     fetch:: t -> s -> t
>     alter:: t -> s -> s
>   
> instance (PList Cons t r) => TypeSeq t (Cons t r) where
>     type_index _ _ = 0
>     fetch _ (Cons v _) = v
>     alter newv (Cons v r)  = Cons newv r
>   
> instance (PList Cons t' r', TypeSeq t r') => TypeSeq t (Cons t' r') where
>     type_index v s = 1 + (type_index v $ cdr s)
>     fetch v s = fetch v $ cdr s
>     alter newv (Cons v' r') = Cons v' $ alter newv r'

The initial type heap (the initial universe).

> th_init = Cons 'a' $ Cons True $ Cons (1::Int) $ Nil
>
> th_gequal tenv x y | type_index (undefined::Char) tenv == type_index x tenv
>        = let t1 = alter x tenv
>              t2 = alter y tenv
> 	 in fetch (undefined::Char) t1 == fetch (undefined::Char) t2
> th_gequal tenv x y | type_index (undefined::Int) tenv == type_index x tenv
>        = let t1 = alter x tenv
>              t2 = alter y tenv
> 	 in fetch (undefined::Int) t1 == fetch (undefined::Int) t2
> th_gequal tenv x y | type_index (undefined::Bool) tenv == type_index x tenv
>        = let t1 = alter x tenv
>              t2 = alter y tenv
> 	 in fetch (undefined::Bool) t1 == fetch (undefined::Bool) t2
>  
> th_cast tenv x :: y = fetch (undefined::y) $ alter x tenv
>
> th_test1 = [th_gequal th_init (1::Int) (2::Int),
>             th_gequal th_init True True,
> 	      th_gequal th_init 'a' 'b']

Let us see what is involved in adding a new datatype. Let us add Float:

First, we introduce an extended universe:

> th_heap2 = Cons (1.0::Float) $ th_init

Then we extend the function th_gequal. We do _not_ need to modify the
code of the latter:

> th_gequal2 tenv x y | type_index (undefined::Float) tenv == type_index x tenv
>        = let t1 = alter x tenv
>              t2 = alter y tenv
> 	 in fetch (undefined::Float) t1 == fetch (undefined::Float) t2
> -- delegate the rest to the old th_gequal
> th_gequal2 tenv x y = th_gequal tenv x y
>
> -- th_cast doesn't need to change
>
> th_test2 = [th_gequal2 th_heap2 (1.0::Float) (2.0::Float),
>             th_gequal2 th_heap2 (3.14::Float) (3.14::Float),
> 	      th_gequal2 th_heap2 (1::Int) (2::Int),
> 	      th_gequal2 th_heap2 True True,
> 	      th_gequal  th_heap2 'a' 'b'] -- old th_gequal used here!

*> *Main> th_test2
*> [False,True,False,True,False]

We should emphasize the modularity of the latter approach. No
_declarations_ need to be changed. In the present, static case, there
are no type class instances to add. Furthermore, no code needs to be
altered either. The function th_gequal can still be used with the
extended heap.

*safe* coerce: four methods compared

safe coerce: four methods compared