*safe* coerce: four methods compared

oleg@pobox.com oleg@pobox.com
Mon, 4 Aug 2003 21:07:30 -0700 (PDT)


This message illustrates how safe casting with multiple universes can
be extended to new user-defined, polymorphic datatypes. We show a
_portable_ mapping of polymorphic types to integers. Different
instances of a polymorphic type map to different integers. Phantom
types can be either disregarded or accounted for -- as the user
wishes. Furthermore, if two different applications running on two
different machines agree on the same type heap, then they agree on the
type encoding. An application can use multiple typeheaps at the same
time. It is easy therefore to dedicate one particular typeheap for
portable encoding of types across several computers.

Incidentally, our encoding of types may take into account _values_ of
some of the components of a polymorphic type. For example, given a
value of a type 'Foo Int a', the encoding may use the _value_ of the
first component and the _type_ of the second component. When we do the
cast, we can check not only for the desired type but also for the
desired values. We can thus approach dependent types.

This message hopefully replies to Ralf Laemmel's comment:
] You should make very clear that there is
] not just a single safeCoerce, but the TI/init_typeseq argument has to
] be constructed and supplied by the programmer in a way that (s)he
] decides what array of types can be handled.  So if you wanted to use
] your approach to scrap boilerplate [1], say deal with many datatypes,
] this becomes quite a burden.  Think of actually building initial type
] sequences. Think of how combinators need to be parameterised to take
] type sequences.

with which I agree. Below I try to make it very implicit what depends
on TI/init_typeseq and what doesn't, and how much work is involved in
adding new datatypes and extending type heaps. I don't know if the
proposed approach is better than the others in many
circumstances. It's certainly different.

Ralf Laemmel also wrote:
] Software-engineering-wise your approach suffers from an important
] weakness: a closed world assumption.  The programmer has to maintain
] your "TI" and pass it on in all kinds of contexts for the array of
] types to be handled.

I thought it is a feature. I thought a programmer can import some type
heap, partially apply the needed function to it, and re-export the
latter. The example below demonstrates what is involved when a new
datatype is added. It seems not that much. 


] I didn't need undecidable not even overlapping instances.

I don't actually need overlapping-instances extensions. An earlier
message on the subject of MRefs used similar type heaps without any
need for -fallow-overlapping-instances. However I had to use numeral
types such as Succ (Succ ... Zero) rather than Int for type
indices. The current approach looks more elegant. Besides, you seem to
in favor of giving overlapping instances more acceptance, support and
legitimacy.

] Is it obvious to see that fetching stuff from the type sequences would
] be indeed efficient for long PLists?

The sequence of 'cdr' operations needed for fetching stuff is known
statically. A compiler might therefore do something intelligent.


This whole message is self-contained, and can be loaded as it is in
GHCi, given the flags
  -fglasgow-exts -fallow-undecidable-instances -fallow-overlapping-instances

We start with the boilerplate, which has changed a little (for
example, the class PLists now has a member function pllen).

> data Nil t r  = Nil
> data Cons t r = Cons t r
>
> class PList ntype vtype cdrtype where
>     cdr::   ntype vtype cdrtype -> cdrtype
>     empty:: ntype vtype cdrtype -> Bool
>     value:: ntype vtype cdrtype -> vtype
>     pllen:: ntype vtype cdrtype -> Int
>   
> instance PList Nil vtype cdrtype where
>     empty = const True
>     pllen = const 0
>   
> instance (PList n v r) => PList Cons v' (n v r) where
>      empty = const False
>      value (Cons v r) = v
>      cdr   (Cons v r) = r
>      pllen (Cons v r) = 1 + pllen r
>     
> class TypeSeq t s where
>     type_index:: t -> s -> Int
>     fetch:: t -> s -> t
>     alter:: t -> s -> s
>   
> instance (PList Cons t r) => TypeSeq t (Cons t r) where
>     type_index _ _ = 0
>     fetch _ (Cons v _) = v
>     alter newv (Cons v r)  = Cons newv r
>    
> instance (PList Cons t' r', TypeSeq t r') => TypeSeq t (Cons t' r') where
>     type_index v s = 1 + (type_index v $ cdr s)
>     fetch v s = fetch v $ cdr s
>     alter newv (Cons v' r') = Cons v' $ alter newv r'

The initial typesequence:

> init_typeseq = Cons (undefined::Char) $
>                Cons (undefined::Int) $
> 	       Cons (undefined::Bool) $
> 	       Cons (undefined::String) $
> 	       (Nil::Nil () ())

and its type. See the previous message for more discussion of the
latter.

> type TI 
>   = (Cons
>     Char
>     (Cons Int (Cons Bool (Cons String (Nil () ())))))

Because we will be dealing with existential types, we need to extend
the compile-time indexing into run-time:

> class (TypeSeq t u) => TypeRep t u where
>     tr_index:: t -> u -> Int
>     tr_index = type_index
>     inj::  t -> u -> u
>     inj = alter
>     prj::  t -> u -> t
>     prj = fetch
>    
> instance TypeRep Bool TI
> instance TypeRep Char TI
> instance TypeRep String TI
> instance TypeRep Int TI

The following declarations of the datatype DTR and functions ti and
tinj do _not_ depend on a particular type environment. These functions
are fully generic with respect to TypeSeq and do not depend on
init_typeseq. The purpose of these declarations should be become clear
in a moment.

> data DTR s = forall t. (TypeRep t s) => DTR t s
>
> ti decon v = foldl1 (\acc x -> base*acc + x) $ 
>              map (\(DTR v s) -> 1+ type_index v s) ta
>     where
>       base = case head ta of (DTR _ tenv) -> pllen tenv
>       ta = decon v
>			
> tinj decon v = map (\(DTR v s) -> alter v s) $ decon v


We come now to the extension part. Suppose the user declares the
following polymorphic datatype (along the lines suggested by Wang
Meng):

> data UT a b = C | D a | E a b deriving Show

If we wish to index it, we need to extend our typeheap with a
_monomorphic_ instance of the datatype:

> typeseq1 = Cons (undefined::UT () ()) $ init_typeseq
>
> type IT2 = Cons (UT () ()) TI

We chose "UT () ()" as a representative monomorphic instance.

We also need to extend TypeRep. It is actually quite easy. First we
assert that everything indexable against init_typeseq can be indexed
against typeseq1:

> instance (TypeRep a TI,TypeSeq a IT2) => TypeRep a IT2

The compiler will check in due course if this assertion holds. We can
now index (UT () ()):

> instance TypeRep (UT () ()) IT2

We should note that these instance declarations are trivial: they
rely on the default implementation of the relevant methods.

Now, the user has to provide two functions: ut_decon and ut_con. The
former is to deconstruct a value of UT x y into an array of DTRs --
an array of polymorphic values that constitute the given UT value. We
chose the following deconstructor:

> ut_decon C = [DTR (undefined::UT () ()) typeseq1, DTR (1::Int) typeseq1]
> ut_decon (D x) = [DTR (undefined::UT () ()) typeseq1, 
>                   DTR (2::Int) typeseq1,
> 		    DTR x typeseq1]
> ut_decon (E x y) = [DTR (undefined::UT () ()) typeseq1, 
>                     DTR (3::Int) typeseq1,
> 		      DTR x typeseq1,
> 		      DTR y typeseq1]		  

Note that a value "C" has the type "UT x y" where x and y can be any
particular type or any type. The above encoding disregards the phantom
x and y when encoding C. If we wished, we could have taken the phantom
types into account.

To encode a type UT x y, we need to pass a value of that type to "ti
ut_decon". We get an integer, which is, in base B, a_0 a_1 a_2 ...
with each "digit" encoding the type of one component of UT. The base B
is the size of the type environment.

Examples:

*> *Main> ti ut_decon (C::UT Bool Bool)
*> 8
*> *Main> ti ut_decon (C::UT Bool Char)
*> 8  -- as we saw, phantom types are disregarded.
*> *Main> ti ut_decon (D 'a' ::UT Char Bool)
*> 42
*> *Main> ti ut_decon (D 1 ::UT Int Bool)
*> 43
*> *Main> ti ut_decon (E 'a' 'b')
*> 212
*> *Main> ti ut_decon (E 'a' 'c')
*> 212
*> *Main> ti ut_decon (E 'a' True)
*> 214
*> *Main> ti ut_decon (E True 'a')
*> 222

The other function is the constructor (aka projector)

> ut_con (thd:tflag:targs) :: UT x y 
>   = case fetch (undefined::Int) tflag of
>     1 -> C
>     2 -> D $ fetch (undefined::x) (head targs)
>     3 -> E (fetch undefined t1) (fetch undefined t2)
>          where [t1,t2] = targs

The injector is a composition ut_con with (tinj ut_decon). The
projector and the injector together give us the safe cast:

*> *Main> (ut_con $ tinj ut_decon (C::UT Bool Bool))::(UT Bool Bool) 
*> C
*> *Main> (ut_con $ tinj ut_decon (C::UT Bool Bool))::(UT Int Int)
*> C -- C can be cast into (UT Bool Bool) and (UT Int Int)
*> *Main> (ut_con $ tinj ut_decon (E True 'a'))::(UT Bool Char)
*> E True 'a'
*> *Main> (ut_con $ tinj ut_decon (E True 'a'))::(UT Bool Bool)
*> E True *** Exception: Prelude.undefined -- cast error!