*safe* coerce, for regular and existential types

oleg@pobox.com oleg@pobox.com
Thu, 31 Jul 2003 19:15:32 -0700 (PDT)


This message describes functions safeCast and sAFECoerce implemented
in Haskell98 with common, pure extensions. The functions can be used
to 'escape' from or to existential quantification and to make
existentially-quantified datatypes far easier to deal with. Unlike
Dynamic, the present approach is pure, avoids unsafeCoerce and
unsafePerformIO, and permits arbitrary multiple user-defined typeheaps
(finite maps from types to integers and values).

An earlier message [1] introduced finite type maps for
purely-functional conversion of monomorphic types to unique
integers. The solution specifically did not rely on Dynamic and
therefore is free from unsafePerformIO. This message shows that the
type maps can be used for a safe cast, in particular, for laundering
existential types. The code in this message does NOT use
unsafePerformIO or unsafeCoerce. To implement safe casts, we define a
function sAFECoerce -- which works just like its impure
counterpart. However the former is pure and safe. sAFECoerce is a
library function expressed in Haskell with common extension. The
safety of sAFECoerce is guaranteed by the typechecker itself.

This whole message is self-contained, and can be loaded as it is in
GHCi, given the flags
  -fglasgow-exts -fallow-undecidable-instances -fallow-overlapping-instances

This message was inspired by Amr Sabry's problem on existentials.  In
fact, it answers an unstated question in Amr Sabry's original message.


It has been observed on this list that existentially-quantified
datatypes are not easy to deal with [2]. For example, suppose we have
a value of a type

> data EV = forall a. (TypeRep a TI)=> EV a

(please disregard the second argument of TypeRep for a moment).

The constructor EV wraps a value. Suppose we can guess that the
wrapped value is actually a boolean. Even if our guess is correct, we
*cannot* pass that value to any function of booleans:

*> *Main> case (EV False) of (EV x) -> not x
*>
*> <interactive>:1:
*>     Inferred type is less polymorphic than expected
*>         Quantified type variable `a' is unified with `Bool'
*>     When checking an existential match that binds
*>         x :: a
*>     and whose type is EV -> Bool
*>     In a case alternative: (EV x) -> not x

A quantified type variable cannot be unified with any regular type --
or with another quantified type variable. Values of existentially
quantified types cannot be passed to monomorphic functions, or to
constrained polymorphic functions (unless all their constrains have
been mentioned in the declaration of the existential). That limitation
guarantees safety -- on the other hand, it significantly limits the
convenience of existential datatypes [2].

To overcome the limitation, it _seemed_ that we had to sacrifice
purity. If we are positive that a particular existentially quantified
value has a specific type (e.g., Bool), we can use unsafeCoerce to
cast the value into the type Bool [3]. This approach is one of the
foundations of the Dynamic library. The other foundation is an ability
to represent a type as a unique run-time value, provided by the
methods of the class like TypeRep. Given an existentially quantified
value and a value of the desired type, Dynamic compares type
representations of the two values. If they are the same, we can
confidently use unsafeCoerce to cast the former into the type of the
latter.

This works, yet leaves the feeling of dissatisfaction. For one thing,
we had to resort to an impure feature. More importantly, we placed our
trust in something like TypeRep and its members, that they give an
accurate and unique representation of types. But what if they lie to
us, due to a subtle bug in their implementation? What if they give the
same representation for two different types? unsafeCoerce will do its
dirty work nevertheless. Using the result would lead to grave
consequences, however.

This message describes sAFECoerce and the corresponding safe cast.
Both functions convert the values of one type into the target type.
One or both of these types may be existentially quantified.  When the
source and the target types are the same, both functions act as the
identity function. The safe cast checks that the type representations
of the source and the target types are the same. If they are, it
invokes sAFECoerce. Otherwise, we monadically fail. The function
sAFECoerce does the conversion without any type checking. It always
returns the value of the target type. If the source type was the same
as the target type, the return value has the same "bit pattern" as the
argument. If the types differ, the return value is just some default
value of the right type. The user can specify the default value as he
wishes.

Therefore, we can now write

*> *Main> case (EV False) of (EV x) -> not $ sAFECoerce x        
*> True

We can also try

*> *Main> case (EV 'a') of (EV x) -> not $ sAFECoerce x
*> *** Exception: Prelude.undefined

The default value was 'undefined'.

The function safeCast is actually trivial

> safeCast::(Monad m, TypeRep b TI, TypeRep a TI) => a -> m b
> safeCast a :: m b 
>  = if tr_index a (undefined::TI) == tr_index (undefined::b) (undefined::TI)
>    then return $ sAFECoerce a
>    else fail "miscast"

*> *Main> case (EV False) of (EV x) -> (safeCast x)::Maybe Bool 
*> Just False
*> *Main> case (EV False) of (EV x) -> (safeCast x)::Maybe Int 
*> Nothing

The above examples cast the value of an existentially quantified type
into a regular type. As we shall see at the end, we can just as well
cast _into_ an existentially quantified type.

Some of the code in this message is similar to the code posted here
earlier. However, there are many small and subtle changes. It seemed
more convenient to include all the code, to make this message
self-contained.


We start with the polymorphic list as a base data structure, as
before:

> data Nil t r  = Nil
> data Cons t r = Cons t r
>
> class PList ntype vtype cdrtype where
>     cdr::   ntype vtype cdrtype -> cdrtype
>     empty:: ntype vtype cdrtype -> Bool
>     value:: ntype vtype cdrtype -> vtype
>   
> instance PList Nil vtype cdrtype where
>     empty = const True
>
> instance (PList n v r) => PList Cons v' (n v r) where
>      empty = const False
>      value (Cons v r) = v
>      cdr   (Cons v r) = r


We use the polymorphic list to define a finite type map. The map in
this message has two additional operations: fetch and alter. 

> class TypeSeq t s where
>     type_index:: t -> s -> Int
>     fetch:: t -> s -> t
>     alter:: t -> s -> s
>    
> instance (PList Cons t r) => TypeSeq t (Cons t r) where
>     type_index _ _ = 0
>     fetch _ (Cons v _) = v
>     alter newv (Cons v r)  = Cons newv r
>    
> instance (PList Cons t' r', TypeSeq t r') => TypeSeq t (Cons t' r') where
>     type_index v s = 1 + (type_index v $ cdr s)
>     fetch v s = fetch v $ cdr s
>     alter newv (Cons v' r') = Cons v' $ alter newv r'

TypeSeq is an associative map between types, values, and integers.
The operation type_index takes a value and a typemap and returns the
index of the type of the value in the map. The operation alter takes a
value and a typemap and returns a new typemap which stores the given
value. The value can be retrieved either by its index -- or by its
type. Here we need only the latter: the operation fetch.

We can build the initial typemap

> init_typeseq = Cons (undefined::Char) $
>                Cons (undefined::Int) $
> 	         Cons (undefined::Bool) $
> 	         Cons (undefined::String) $
> 	         Cons (undefined::Maybe Char) $
> 	         Nil

We also need the type of init_typeseq, to be used in a type-expression
context. Alas, there does not seem to be an easy way to take a type of
a variable in a that context. In the expression context, we can write
init_typeseq::u and then use the type variable 'u' as needed. In the
type-expression context (in the context of data and instance
declarations), this trick does not work. I couldn't find a better
solution than loading the above definition into GHCi, entering ":t
init_typeseq" and cutting and pasting GHCi's answer back into the
code:

> type TI 
>   = Cons
>     Char
>     (Cons Int (Cons Bool (Cons String (Cons (Maybe Char) (Nil Bool Bool)))))


We can test the typemap as follows

*> *Main> type_index True  init_typeseq
*> 2
*> *Main> fetch (undefined::Bool) $ alter True init_typeseq
*> True

> testmap =
>   let tp1 = alter True init_typeseq 
>       tp2 = alter 'a' tp1
>       tp3 = alter False tp2  -- updating the previously-stored value
>   in (fetch (undefined::Bool) tp3, fetch 'x' tp3)

*> *Main> testmap
*> (False,'a')

As before, to deal with existentials, we need to extend the above
compile-time type-mapping to run-time

> class (TypeSeq t u) => TypeRep t u where
>     tr_index:: t -> u -> Int
>     tr_index = type_index
>     inj::  t -> u -> u
>     inj = alter
>     prj::  t -> u -> t
>     prj = fetch
>    
> instance TypeRep Bool TI
> instance TypeRep Char TI
> instance TypeRep String TI
> instance TypeRep Int TI


We can use any TypeSeq -- init_typeseq or any other TypeSeq. We can
define a number of suitable TypeSeq values in various modules and
import the most suitable one.

The function sAFECoerce is trivial:

> sAFECoerce a = sAFECoerce' a (init_typeseq::TI)
>
> sAFECoerce' (a::a) tenv ::b
>   = prj (undefined::b) $ inj a tenv

It stores its argument into the type environment, and then retrieves it
given the target type as the index. If the source and the target types
are the same, the retrieved value is identical to the argument of
sAFECoerce'. Otherwise, sAFECoerce' returns whatever value has been
stored in the typenv under the target type (the default value).


We have remarked earlier that this message was intended to solve an
unstated question in Amr Sabry's original message. The unstated
question is the need to cast into an existentially quantified
datatype.  Here's the description of the problem and the stated
question:

> data F a b = 
>               forall c. (TypeRep c TI) => PushF (a -> c) (F c b) 
>             | Bottom (a -> b)
>
> f1 :: Char -> Bool
> f1 'a' = True
> f1 _ = False
>
> f2 :: Bool -> String
> f2 True = "true"
> f2 False = "false"
>
> f3 :: String -> Int
> f3 = length
>
> fs = PushF f1 (PushF f2 (PushF f3 (Bottom id)))

] Is it possible to write a function 
]   f :: F a b -> T c -> F c b
] where (T c) is some type for values of type 'c' or values representing
] the type 'c' or whatever is appropriate. Thus if given the
] representation of Bool, the function should return:
]  PushF f2 (PushF f3 (Bottom id))
] and if given the representation of String the function should return
]  PushF f3 (Bottom id)
] and so on. 

The solution given earlier is:

> data HF = forall a b. (TypeRep a TI,TypeRep b TI) => HF (F a b)
> show_fn_type:: (TypeRep a TI, TypeRep b TI) => (a->b) -> String
> show_fn_type (g::a->b)
> 	= "(" ++ (show (tr_index (undefined::a) (undefined::TI) )) ++ 
> 	   "->"++(show (tr_index (undefined::b) (undefined::TI))) ++ ")"
>
> instance (TypeRep a TI, TypeRep b TI) => Show (F a b) where
>     show = show . hsf_to_lst . HF
>        where
>          hsf_to_lst (HF (Bottom g)) = [show_fn_type g]
> 	   hsf_to_lst (HF (PushF g next)) = (show_fn_type g):(hsf_to_lst$HF next)

>
> f':: (TypeRep c TI) => (F a b) -> c -> F a b
> f' here@(PushF g next::(F a b)) v = 
>     if tr_index v (undefined::TI) == tr_index (g undefined) (undefined::TI)
>     then here
>     else case next of
>           PushF g1 next' -> f' (PushF (g1.g) next') v
>           Bottom g1      -> f' (Bottom (g1.g)) v
>
> f fs v = f' (PushF id fs) v
>
> flatten:: (TypeRep a TI, TypeRep b TI) => F a b -> (a -> b)
> flatten fs :: (a->b)
>     = case f fs (undefined::b) of
>         PushF g (Bottom g1) -> g1.g

In short, the function f' takes a data structure F a b and a value of
type c and returns another data structure F a b, but of a different
structure
        PushF g next
here 'next' has the type F c b -- which is the answer to Amr Sabry's
question. The function g::a->c is a composition of all previously
occurring functions. This is a neat "side-effect" of the function f':
we partially compose the given F a b structure up to the given type.

Technically, the function f' answers Amr Sabry's question. Alas, the
answer, until now, was useless. The answer, the second field in the
structure returned by f', is existentially quantified. We can do
preciously little with it -- until now.

Now we can write

> t1 v = case f fs v of 
>         PushF _ (next::F c b) -> flatten next $ sAFECoerce v

Here we cast into an existential type.

*> *Main> t1 'a'
*> 4
*> *Main> t1 False
*> 5
*> *Main> t1 "xyz"
*> 3



[1] Previous message
Pure functional TypeRep [Was: Existentials...]
http://www.haskell.org/pipermail/haskell/2003-July/012330.html

[2] escape from existential quantification
http://www.haskell.org/pipermail/haskell/2003-February/011288.html

[3] Re: escape from existential quantification
http://www.haskell.org/pipermail/haskell/2003-February/011293.html