[Haskell-cafe] Manual Type-Checking to provide Read instances for GADTs. (was Re: [Haskell-cafe] Read instance for GATD)

Sat Jun 26 06:31:05 EDT 2010

That's really nice. Very interesting. Thank you.

On Fri, Jun 25, 2010 at 9:55 PM, Edward Kmett <ekmett at gmail.com> wrote:

> I've obtained permission to repost Gershom's slides on how to deserialize
> GADTs by typechecking them yourself, which are actually a literate haskell
> source file, that he was rendering to slides. Consequently, I've just pasted
> the content below as a literate email.
>
> -Edward Kmett
>
> -
> Deserializing strongly typed values
>
> (four easy pieces about typechecking)
>
> Gershom Bazerman
>
> -
> prior art:
> Oleg (of course)
> http://okmij.org/ftp/tagless-final/course/course.html
>
> ...but also
> Stephanie Weirich
> http://www.comlab.ox.ac.uk/projects/gip/school/tc.hs
>
> =
> Ahem...
> \begin{code}
> {-# LANGUAGE DeriveDataTypeable,
>              ExistentialQuantification,
>              FlexibleContexts,
>              FlexibleInstances,
>              FunctionalDependencies,
>              GADTs,
>              RankNTypes,
>              ScopedTypeVariables
>  #-}
> \end{code}
> =
> ahem.
> \begin{code}
> import Data.Typeable
> import Data.Maybe
> import Control.Monad.Error
> import Control.Applicative
> import qualified Data.Map as M
> import Unsafe.Coerce
> \end{code}
> =
> A simple ADT.
>
> \begin{code}
> data SimpleExpr = SOpBi String SimpleExpr SimpleExpr
>                 | SOpUn String SimpleExpr
>                 | SDbl Double
>                 | SBool Bool deriving (Read, Show, Typeable)
>
> \end{code}
> Yawn.
> =
> An awesome GADT!
>
> \begin{code}
> data Expr a where
>     EDbl  :: Double -> Expr Double
>     EBool :: Bool -> Expr Bool
>     EBoolOpBi :: BoolOpBi -> Expr Bool -> Expr Bool -> Expr Bool
>     ENumOpBi  :: NumOpBi -> Expr Double -> Expr Double -> Expr Double
>     ENumOpUn  :: NumOpUn -> Expr Double -> Expr Double
>                  deriving Typeable
>
> data NumOpBi = Add | Sub deriving (Eq, Show)
> data NumOpUn = Log | Exp deriving (Eq, Show)
> data BoolOpBi = And | Or deriving (Eq, Show)
> \end{code}
>
> The GADT is well typed. It cannot go wrong.
> -
> It also cannot derive show.
> =
> But we can write show...
>
> \begin{code}
> showIt :: Expr a -> String
> showIt (EDbl d) = show d
> showIt (EBool b) = show b
> showIt (EBoolOpBi op x y) = "(" ++ show op
>                             ++ " " ++ showIt x
>                             ++ " " ++ showIt y ++ ")"
> showIt (ENumOpBi op x y)  = "(" ++ show op
>                             ++ " " ++ showIt x
>                             ++ " " ++ showIt y ++ ")"
> showIt (ENumOpUn op x) = show op ++ "(" ++ showIt x ++ ")"
> \end{code}
> =
> And eval is *much nicer*.
> It cannot go wrong --> no runtime typechecks.
>
> \begin{code}
> evalIt :: Expr a -> a
> evalIt (EDbl x) = x
> evalIt (EBool x) = x
> evalIt (EBoolOpBi op expr1 expr2)
>        | op == And = evalIt expr1 && evalIt expr2
>        | op == Or  = evalIt expr2 || evalIt expr2
>
> evalIt (ENumOpBi op expr1 expr2)
>        | op == Add = evalIt expr1 + evalIt expr2
>        | op == Sub = evalIt expr1 - evalIt expr2
> \end{code}
> =
> But how do we write read!?
>
> read "EBool False" = Expr Bool
> read "EDbl 12" = Expr Double
>
> The type being read depends on the content of the string.
>
> Even worse, we want to read not from a string that looks obvious
> to Haskell (i.e. a standard showlike instance) but from
> something that looks pretty to the user -- we want to *parse*.
>
> So we parse into our simple ADT.
>
> Then we turn our simple ADT into our nice GADT.
> -
> But how?
>
> How do we go from untyped... to typed?
>
> [And in general -- not just into an arbitrary GADT,
> but an arbitrary inhabitant of a typeclass.]
>
> [i.e. tagless final, etc]
>
> =
> Take 1:
> Even if we do not know what type we are creating,
> we eventually will do something with it.
>
> So we paramaterize our typechecking function over
> an arbitrary continuation.
>
> \begin{code}
> mkExpr :: (forall a. (Show a, Typeable a) => Expr a -> r) -> SimpleExpr ->
> r
> mkExpr k expr = case expr of
>                    SDbl d  -> k $ EDbl d
>                    SBool b -> k $ EBool b
>                    SOpUn op expr1 -> case op of
>                       "log" -> k $ mkExpr' (ENumOpUn Log) expr1
>                       "exp" -> k $ mkExpr' (ENumOpUn Exp) expr1
>                       _ -> error "bad unary op"
>                    SOpBi op expr1 expr2 -> case op of
>                       "add" -> k $ mkExprBi (ENumOpBi Add) expr1 expr2
>                       "sub" -> k $ mkExprBi (ENumOpBi Sub) expr1 expr2
> \end{code}
> =
> Where's the typechecking?
>
> \begin{code}
> mkExpr' k expr = mkExpr (appCast $ k) expr
>
>
> mkExprBi k expr1 expr2 = mkExpr' (mkExpr' k expr1) expr2
>
>
> appCast :: forall a b c r. (Typeable a, Typeable b) => (c a -> r) -> c b ->
> r
> appCast f x = maybe err f $ gcast x
>     where err = error $ "Type error. Expected: " ++ show (typeOf
> (undefined::a))
>                 ++ ", Inferred: " ++ show (typeOf (undefined::b))
> \end{code}
>
> ... We let Haskell do all the work!
> =
> Hmmm... the continuation can be anything.
> So let's just make it an existential constructor.
>
> \begin{code}
> data ExprBox = forall a. Typeable a => ExprBox (Expr a)
>
> appExprBox :: (forall a. Expr a -> res) -> ExprBox -> res
> appExprBox f (ExprBox x) = f x
>
> tcCast :: forall a b c. (Typeable a, Typeable b) => Expr a -> Either String
> (Expr b)
> tcCast x = maybe err Right $ gcast x
>     where err = Left $ "Type error. Expected: " ++ show (typeOf
> (undefined::a))
>                 ++ ", Inferred: " ++ show (typeOf (undefined::b))
> \end{code}
>
> Now we can delay deciding what to apply until later.
> Typecheck once, execute repeatedly!
>
> =
> One more trick -- monadic notation lets us
> extend the context of unpacked existentials
> to the end of the do block
>
> \begin{code}
> retBox x = return (ExprBox $ x, typeOf x)
>
> typeCheck :: SimpleExpr -> Either String (ExprBox, TypeRep)
> typeCheck (SDbl d) = retBox (EDbl d)
> typeCheck (SBool b) = retBox (EBool b)
> typeCheck (SOpBi op s1 s2) = case op of
>            "add" -> tcBiOp (ENumOpBi Add)
>            "sub" -> tcBiOp (ENumOpBi Sub)
>            "and" -> tcBiOp (EBoolOpBi And)
>            "or"  -> tcBiOp (EBoolOpBi Or)
>     where
>       tcBiOp constr = do
>         (ExprBox e1, _) <- typeCheck s1
>         (ExprBox e2, _) <- typeCheck s2
>         e1' <- tcCast e1
>         e2' <- tcCast e2
>         retBox $ constr e1' e2'
> \end{code}
> =
> So that's fine for *very* simple expressions.
> How does it work for interesting GADTs?
> (like, for example, HOAS)?
>
> (The prior art doesn't demonstrate HOAS --
>  it uses DeBruijn.)
> =
> Our simple world
>
> \begin{code}
> type Ident = String
> type TypeStr = String
>
> data STerm = SNum Double
>              | SApp STerm STerm
>              | SVar Ident
>              | SLam Ident TypeRep STerm
> \end{code}
>
> Note.. terms are Church style -- each var introduced
> has a definite type.
>
> Determining this type is left as an exercise.
>
> =
> Over the rainbow in well-typed land.
>
> \begin{code}
> data Term a where
>     TNum :: Double -> Term Double
>     TApp :: Term (a -> b) -> Term a -> Term b
>     TLam :: Typeable a => (Term a -> Term b) -> Term (a -> b)
>     TVar :: Typeable a => Int -> Term a
>             deriving Typeable
> \end{code}
>
> Wait! DeBrujin (TVar) *and* HOAS (TLam)! The worst of both worlds.
>
> Don't worry. In the final product all TVars are eliminated by construction.
>
> Exercise to audience: rewrite the code so that TVar can be eliminated
> from the Term type.
> =
> Show and eval...
>
> \begin{code}
> showT :: Int -> Term a -> String
> showT c (TNum d) = show d
> showT c (TApp f x) = "App (" ++ showT c f
>                      ++ ") (" ++ showT c x ++ ")"
> showT c (TLam f)   = "Lam " ++ ("a"++show c)
>                      ++  " -> " ++ (showT (succ c) $ f (TVar c))
> showT c (TVar i)   = "a"++show i
>
> runT :: Term a -> Term a
> runT (TNum d) = (TNum d)
> runT (TLam f) = (TLam f)
> runT (TApp f x) = case runT f of TLam f' -> runT (f' x)
> runT (TVar i) = error (show i)
> \end{code}
> =
> The existential
>
> \begin{code}
> data TermBox = forall a. Typeable a => TermBox (Term a)
> appTermBox :: (forall a. Typeable a => Term a -> res) -> TermBox -> res
> appTermBox f (TermBox x) = f x
> \end{code}
> =
> The typechecker returns a box *and* a typeRep.
>
> Cast is the usual trick.
>
> \begin{code}
> retTBox :: Typeable a => Term a -> Either String (TermBox, TypeRep)
> retTBox x = return (TermBox $ x, typeOf x)
>
> type Env = M.Map Ident (TermBox, TypeRep)
>
> trmCast :: forall a b c. (Typeable a, Typeable b) =>
>            Term a -> Either String (Term b)
> trmCast x = maybe err Right $ gcast x
>     where err = Left $ "Type error. Expected: " ++
>                 show (typeOf (undefined::a))
>                 ++ ", Inferred: " ++
>                 show (typeOf (undefined::b))
> \end{code}
>
> =
> \begin{code}
> typeCheck' :: STerm -> Env -> Either String (TermBox, TypeRep)
> typeCheck' t env = go t env 0
>     where
>       go (SNum d) _ idx = retTBox (TNum d)
>       go (SVar i) env idx = do
>         (TermBox t, _) <- maybe (fail $ "not in scope: " ++ i)
>                           return $ M.lookup i env
>         retTBox $ t
> \end{code}
>
> Nums and vars are easy.
> =
> App and Lam... less so.
>
> \begin{code}
>       go (SApp s1 s2) env idx = do
>         (TermBox e1, tr1) <- go s1 env idx
>         (TermBox e2, _) <- go s2 env idx
>         TermBox rt <- return $ mkTerm $ head $ tail $
>                       typeRepArgs $ head $ typeRepArgs $ tr1
>                       -- TypeReps have their... drawbacks.
>         e1' <- trmCast e1
>         retTBox $ TApp e1' e2 `asTypeOf` rt
>       go (SLam i tr s) env idx = do
>         TermBox et <- return $ mkTerm tr
>         (TermBox e, _) <- go s
>                            (M.insert i
>                                 (TermBox (TVar idx `asTypeOf` et),tr)
>                                 env
>                            )
>                            (idx + 1)
>         retTBox $ TLam (\x -> subst (x `asTypeOf` et) idx e)
> \end{code}
> =
> How does mkTerm work?
>
> \begin{code}
> mkTerm :: TypeRep -> TermBox
> mkTerm tr = go tr TermBox
>     where
>       go :: TypeRep -> (forall a. (Typeable a) => Term a -> res) -> res
>       go tr k
>           | tr == typeOf (0::Double) = k (TNum 0)
>           | typeRepTyCon tr == arrCon =
>                  go (head $ typeRepArgs tr)
>                  $ \xt -> go (head $ tail $ typeRepArgs tr)
>                  $ \y -> k (TLam $ \x -> const y (x `asTypeOf` xt))
>
>       arrCon = typeRepTyCon $ typeOf (undefined::Int -> String)
> \end{code}
>
> Same principle -- but can build arrows directly.
>
> Doing so requires staying cps... I think.
> =
> And this is how we get rid of the DeBruijn terms.
>
> \begin{code}
> subst :: (Typeable a) => Term a -> Int -> Term b -> Term b
> subst t i trm = go trm
>     where
>       go :: Term c -> Term c
>       go (TNum d) = (TNum d)
>       go (TApp f x) = TApp (go f) (go x)
>       go (TLam f) = TLam (\a -> go (f a))
>       go (TVar i')
>          | i == i' = either error id $ trmCast t
>          | otherwise = (TVar i')
> \end{code}
>
> Q: Now you see why DeBruijn is handy -- substitution
> is otherwise a pain.
>
> =
> But note -- all functions must be monotyped.
> This is the simply typed lambda calculus.
>
> How do we represent TLam (\a -> a)?
>
> The masses demand HM polymorphism.
> -
>
> Take 4:
>
> A damn dirty hack.
>
> =
> All hacks begin with Nats.
>
> \begin{code}
> data Z = Z deriving (Show, Typeable)
> data S a = S a deriving (Show, Typeable)
> \end{code}
> =
> typeCheck is almost the same.
>
> \begin{code}
> typeCheck'' :: STerm -> Env -> Either String (TermBox, TypeRep)
> typeCheck'' t env = go t env 0
>     where
>       go :: STerm -> Env -> Int -> Either String (TermBox, TypeRep)
>       go (SNum d) _ idx = retTBox (TNum d)
>       go (SVar i) env idx = do
>         (TermBox t, _) <- maybe (fail $ "not in scope: " ++ i)
>                           return $ M.lookup i env
>         retTBox $ t
> \end{code}
> =
> \begin{code}
>       go (SApp s1 s2) env idx = do
>         (TermBox e1, tr1) <- go s1 env idx
>         (TermBox e2, _) <- go s2 env idx
>         TermBox rt  <- unifyAppRet e1 e2
>         e1' <- unifyAppFun e1 e2
>         retTBox $ TApp e1' e2 `asTypeOf` rt
>       go (SLam i tr s) env idx = do
>         TermBox et <- return $ mkTerm' $ tr
>         (TermBox e, _) <- go s (M.insert i
>                                 (TermBox (TVar idx `asTypeOf` et),tr)
>                                 env)
>                                (idx + 1)
>         retTBox $ TLam (\x -> subst (x `asTypeOf` et) idx e)
> \end{code}
>
> It looks like we just factored on the nasty arrow code.
> =
> mkTerm is almost the same... we just extended it to deal with Nats.
> \begin{code}
> mkTerm' :: TypeRep -> TermBox
> mkTerm' tr = go tr TermBox
>     where
>       go :: TypeRep -> (forall a. (Typeable a) => Term a -> res) -> res
>       go tr k
>           | tr == typeOf (0::Double) = k (TNum 0)
>           | tr == typeOf Z = k (TVar 0 :: Term Z)
>           | typeRepTyCon tr == succCon = go (head $ typeRepArgs tr)
>                                $ \t -> k $ succTerm t
>           | isArr tr =
>                  go (head $ typeRepArgs tr)
>                  $ \xt -> go (head $ tail $ typeRepArgs tr)
>                  $ \y -> k (TLam $ \x -> const y (x `asTypeOf` xt))
>           | otherwise = error $ show tr
>
>       succCon = typeRepTyCon $ typeOf (S Z)
>       succTerm :: Typeable b => Term b -> Term (S b)
>       succTerm _ = TVar 0
> \end{code}
> =
> Some utilities
> \begin{code}
> isArr :: TypeRep -> Bool
> isArr x = typeRepTyCon x == (typeRepTyCon $ typeOf (undefined::Int ->
> String))
>
> splitArrCon :: TypeRep -> Either String (TypeRep, TypeRep)
> splitArrCon x
>     | isArr x = case typeRepArgs x of
>                   [a,b] -> Right (a,b)
>                   _ -> Left $ "Expected function, inferred: " ++ show x
>     | otherwise = Left $ "Expected function, inferred: " ++ show x
>
> \end{code}
> =
> Give an arrow term we unify it with its argument...
> and return a witness.
> \begin{code}
> unifyAppRet :: forall a b. (Typeable a, Typeable b) =>
>                Term a -> Term b -> Either String TermBox
> unifyAppRet x y = do
>   tr <- unifyAppTyps
>         (head $ typeRepArgs $ typeOf x)
>         (head $ typeRepArgs $ typeOf y)
>   return $ mkTerm' tr
>
> \end{code}
> =
> Yes. Actual unification.
> (although this is unification of a type template,
>  so at least it is local)
> \begin{code}
> unifyAppTyps :: TypeRep -> TypeRep -> Either String TypeRep
> unifyAppTyps trf trx = do
>   (fl,fr) <- splitArrCon trf
>   env <- go M.empty fl trx
>   subIt env fr
>     where
>       -- go yields a substitution environment.
>       go :: M.Map String TypeRep -> TypeRep ->
>             TypeRep -> Either String (M.Map String TypeRep)
>       go env x y
>          | isFree x = case M.lookup (show x) env of
>                         Just x' -> if x' == y then return env else Left
> (error "a")
>                         Nothing -> return $ M.insert (show x) y env
>          | isArr x = do
>                  (lh,rh) <- splitArrCon x
>                  (lh',rh') <- splitArrCon y
>                  env' <- go env lh lh'
>                  go env' rh rh'
>          | otherwise = if x == y then return env else Left (error "b")
> \end{code}
> =
> \begin{code}
>       -- subIt applies it
>       subIt :: M.Map String TypeRep -> TypeRep -> Either String TypeRep
>       subIt env x
>             | isFree x = case M.lookup (show x) env of
>                            Just x' -> return x'
>                            Nothing -> Left (error "c")
>             | isArr x = do
>                  (lh,rh) <- splitArrCon x
>                  lh' <- subIt env lh
>                  rh' <- subIt env rh
>                  return $ mkTyConApp arrCon [lh',rh']
>             | otherwise = return x
>       succCon = typeRepTyCon $ typeOf (S Z)
>       zCon = typeRepTyCon $ typeOf Z
>       isFree x = typeRepTyCon x `elem` [zCon, succCon]
>       arrCon = (typeRepTyCon $ typeOf (undefined::Int -> String))
> \end{code}
>
> =
> And now, we just have to convince GHC that they unify!
>
> \begin{code}
> unifyAppFun :: forall a b c. (Typeable a, Typeable b, Typeable c)
>                => Term a -> Term b -> Either String (Term c)
> unifyAppFun x y = do
>   unifyAppTyps (head $ typeRepArgs $ typeOf x) (head $ typeRepArgs $ typeOf
> y)
>   return $ unsafeCoerce x
> \end{code}
> =
> Problem solved.
>
>
> On Fri, Jun 25, 2010 at 2:03 PM, Edward Kmett <ekmett at gmail.com> wrote:
>
>> It turns out that defining Read is somewhat tricky to do for a GADT.
>>
>> Gershom Bazerman has some very nice slides on how to survive the process
>> by manual typechecking (much in the spirit of Oleg's meta-typechecking code
>> referenced by Stephen's follow up below)
>>
>> He presented them at hac-phi this time around.
>>
>> I will check with him to see if I can get permission to host them
>> somewhere and post a link to them here.
>>
>> -Edward Kmett
>>
>>
>> On Fri, Jun 25, 2010 at 5:04 AM, <corentin.dupont at ext.mpsa.com> wrote:
>>
>>>
>>> Hello Haskellers,
>>>
>>> I'm having trouble writing a Read Instance for my GATD.
>>> Arg this GATD!! It causes me more problems that it solves ;)
>>> Especially with no automatic deriving, it adds a lot of burden to my
>>> code.
>>>
>>> >data Obs a where
>>> >    ProposedBy :: Obs Int       -- The player that proposed the tested
>>> rule
>>> >    Turn       :: Obs Turn      -- The current turn
>>> >    Official   :: Obs Bool      -- whereas the tested rule is official
>>> >    Equ        :: (Eq a, Show a, Typeable a) => Obs a -> Obs a -> Obs
>>> Bool
>>> >    Plus       :: (Num a) => Obs a -> Obs a -> Obs a
>>> >    Time       :: (Num a) => Obs a -> Obs a -> Obs a
>>> >    Minus      :: (Num a) => Obs a -> Obs a -> Obs a
>>> >    And        :: Obs Bool -> Obs Bool -> Obs Bool
>>> >    Or         :: Obs Bool -> Obs Bool -> Obs Bool
>>> >    Not        :: Obs Bool -> Obs Bool
>>> >    Konst      :: a -> Obs a
>>>
>>>
>>> > instance Read a => Read (Obs a) where
>>> > readPrec = (prec 10 $ do
>>> >        Ident "ProposedBy" <- lexP
>>> >        return (ProposedBy))
>>> >     +++
>>> >      (prec 10 $ do
>>> >        Ident "Official" <- lexP
>>> >        return (Official))
>>> >  (etc...)
>>>
>>> Observable.lhs:120:8:
>>>    Couldn't match expected type `Int' against inferred type `Bool'
>>>      Expected type: ReadPrec (Obs Int)
>>>      Inferred type: ReadPrec (Obs Bool)
>>>
>>>
>>> Indeed "ProposedBy" does not have the same type that "Official".
>>> Mmh how to make it all gently mix altogether?
>>>
>>>
>>> Best,
>>> Corentin
>>>
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>
>>
>>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100626/5b88db50/attachment-0001.html