[Haskell-cafe] Manual Type-Checking to provide Read instances for
GADTs. (was Re: [Haskell-cafe] Read instance for GATD)
Dupont Corentin
corentin.dupont at gmail.com
Sat Jun 26 06:31:05 EDT 2010
That's really nice. Very interesting. Thank you.
On Fri, Jun 25, 2010 at 9:55 PM, Edward Kmett <ekmett at gmail.com> wrote:
> I've obtained permission to repost Gershom's slides on how to deserialize
> GADTs by typechecking them yourself, which are actually a literate haskell
> source file, that he was rendering to slides. Consequently, I've just pasted
> the content below as a literate email.
>
> -Edward Kmett
>
> -
> Deserializing strongly typed values
>
> (four easy pieces about typechecking)
>
> Gershom Bazerman
>
> -
> prior art:
> Oleg (of course)
> http://okmij.org/ftp/tagless-final/course/course.html
>
> ...but also
> Stephanie Weirich
> http://www.comlab.ox.ac.uk/projects/gip/school/tc.hs
>
> =
> Ahem...
> \begin{code}
> {-# LANGUAGE DeriveDataTypeable,
> ExistentialQuantification,
> FlexibleContexts,
> FlexibleInstances,
> FunctionalDependencies,
> GADTs,
> RankNTypes,
> ScopedTypeVariables
> #-}
> \end{code}
> =
> ahem.
> \begin{code}
> import Data.Typeable
> import Data.Maybe
> import Control.Monad.Error
> import Control.Applicative
> import qualified Data.Map as M
> import Unsafe.Coerce
> \end{code}
> =
> A simple ADT.
>
> \begin{code}
> data SimpleExpr = SOpBi String SimpleExpr SimpleExpr
> | SOpUn String SimpleExpr
> | SDbl Double
> | SBool Bool deriving (Read, Show, Typeable)
>
> \end{code}
> Yawn.
> =
> An awesome GADT!
>
> \begin{code}
> data Expr a where
> EDbl :: Double -> Expr Double
> EBool :: Bool -> Expr Bool
> EBoolOpBi :: BoolOpBi -> Expr Bool -> Expr Bool -> Expr Bool
> ENumOpBi :: NumOpBi -> Expr Double -> Expr Double -> Expr Double
> ENumOpUn :: NumOpUn -> Expr Double -> Expr Double
> deriving Typeable
>
> data NumOpBi = Add | Sub deriving (Eq, Show)
> data NumOpUn = Log | Exp deriving (Eq, Show)
> data BoolOpBi = And | Or deriving (Eq, Show)
> \end{code}
>
> The GADT is well typed. It cannot go wrong.
> -
> It also cannot derive show.
> =
> But we can write show...
>
> \begin{code}
> showIt :: Expr a -> String
> showIt (EDbl d) = show d
> showIt (EBool b) = show b
> showIt (EBoolOpBi op x y) = "(" ++ show op
> ++ " " ++ showIt x
> ++ " " ++ showIt y ++ ")"
> showIt (ENumOpBi op x y) = "(" ++ show op
> ++ " " ++ showIt x
> ++ " " ++ showIt y ++ ")"
> showIt (ENumOpUn op x) = show op ++ "(" ++ showIt x ++ ")"
> \end{code}
> =
> And eval is *much nicer*.
> It cannot go wrong --> no runtime typechecks.
>
> \begin{code}
> evalIt :: Expr a -> a
> evalIt (EDbl x) = x
> evalIt (EBool x) = x
> evalIt (EBoolOpBi op expr1 expr2)
> | op == And = evalIt expr1 && evalIt expr2
> | op == Or = evalIt expr2 || evalIt expr2
>
> evalIt (ENumOpBi op expr1 expr2)
> | op == Add = evalIt expr1 + evalIt expr2
> | op == Sub = evalIt expr1 - evalIt expr2
> \end{code}
> =
> But how do we write read!?
>
> read "EBool False" = Expr Bool
> read "EDbl 12" = Expr Double
>
> The type being read depends on the content of the string.
>
> Even worse, we want to read not from a string that looks obvious
> to Haskell (i.e. a standard showlike instance) but from
> something that looks pretty to the user -- we want to *parse*.
>
> So we parse into our simple ADT.
>
> Then we turn our simple ADT into our nice GADT.
> -
> But how?
>
> How do we go from untyped... to typed?
>
> [And in general -- not just into an arbitrary GADT,
> but an arbitrary inhabitant of a typeclass.]
>
> [i.e. tagless final, etc]
>
> =
> Take 1:
> Even if we do not know what type we are creating,
> we eventually will do something with it.
>
> So we paramaterize our typechecking function over
> an arbitrary continuation.
>
> \begin{code}
> mkExpr :: (forall a. (Show a, Typeable a) => Expr a -> r) -> SimpleExpr ->
> r
> mkExpr k expr = case expr of
> SDbl d -> k $ EDbl d
> SBool b -> k $ EBool b
> SOpUn op expr1 -> case op of
> "log" -> k $ mkExpr' (ENumOpUn Log) expr1
> "exp" -> k $ mkExpr' (ENumOpUn Exp) expr1
> _ -> error "bad unary op"
> SOpBi op expr1 expr2 -> case op of
> "add" -> k $ mkExprBi (ENumOpBi Add) expr1 expr2
> "sub" -> k $ mkExprBi (ENumOpBi Sub) expr1 expr2
> \end{code}
> =
> Where's the typechecking?
>
> \begin{code}
> mkExpr' k expr = mkExpr (appCast $ k) expr
>
>
> mkExprBi k expr1 expr2 = mkExpr' (mkExpr' k expr1) expr2
>
>
> appCast :: forall a b c r. (Typeable a, Typeable b) => (c a -> r) -> c b ->
> r
> appCast f x = maybe err f $ gcast x
> where err = error $ "Type error. Expected: " ++ show (typeOf
> (undefined::a))
> ++ ", Inferred: " ++ show (typeOf (undefined::b))
> \end{code}
>
> ... We let Haskell do all the work!
> =
> Hmmm... the continuation can be anything.
> So let's just make it an existential constructor.
>
> \begin{code}
> data ExprBox = forall a. Typeable a => ExprBox (Expr a)
>
> appExprBox :: (forall a. Expr a -> res) -> ExprBox -> res
> appExprBox f (ExprBox x) = f x
>
> tcCast :: forall a b c. (Typeable a, Typeable b) => Expr a -> Either String
> (Expr b)
> tcCast x = maybe err Right $ gcast x
> where err = Left $ "Type error. Expected: " ++ show (typeOf
> (undefined::a))
> ++ ", Inferred: " ++ show (typeOf (undefined::b))
> \end{code}
>
> Now we can delay deciding what to apply until later.
> Typecheck once, execute repeatedly!
>
> =
> One more trick -- monadic notation lets us
> extend the context of unpacked existentials
> to the end of the do block
>
> \begin{code}
> retBox x = return (ExprBox $ x, typeOf x)
>
> typeCheck :: SimpleExpr -> Either String (ExprBox, TypeRep)
> typeCheck (SDbl d) = retBox (EDbl d)
> typeCheck (SBool b) = retBox (EBool b)
> typeCheck (SOpBi op s1 s2) = case op of
> "add" -> tcBiOp (ENumOpBi Add)
> "sub" -> tcBiOp (ENumOpBi Sub)
> "and" -> tcBiOp (EBoolOpBi And)
> "or" -> tcBiOp (EBoolOpBi Or)
> where
> tcBiOp constr = do
> (ExprBox e1, _) <- typeCheck s1
> (ExprBox e2, _) <- typeCheck s2
> e1' <- tcCast e1
> e2' <- tcCast e2
> retBox $ constr e1' e2'
> \end{code}
> =
> So that's fine for *very* simple expressions.
> How does it work for interesting GADTs?
> (like, for example, HOAS)?
>
> (The prior art doesn't demonstrate HOAS --
> it uses DeBruijn.)
> =
> Our simple world
>
> \begin{code}
> type Ident = String
> type TypeStr = String
>
> data STerm = SNum Double
> | SApp STerm STerm
> | SVar Ident
> | SLam Ident TypeRep STerm
> \end{code}
>
> Note.. terms are Church style -- each var introduced
> has a definite type.
>
> Determining this type is left as an exercise.
>
> =
> Over the rainbow in well-typed land.
>
> \begin{code}
> data Term a where
> TNum :: Double -> Term Double
> TApp :: Term (a -> b) -> Term a -> Term b
> TLam :: Typeable a => (Term a -> Term b) -> Term (a -> b)
> TVar :: Typeable a => Int -> Term a
> deriving Typeable
> \end{code}
>
> Wait! DeBrujin (TVar) *and* HOAS (TLam)! The worst of both worlds.
>
> Don't worry. In the final product all TVars are eliminated by construction.
>
> Exercise to audience: rewrite the code so that TVar can be eliminated
> from the Term type.
> =
> Show and eval...
>
> \begin{code}
> showT :: Int -> Term a -> String
> showT c (TNum d) = show d
> showT c (TApp f x) = "App (" ++ showT c f
> ++ ") (" ++ showT c x ++ ")"
> showT c (TLam f) = "Lam " ++ ("a"++show c)
> ++ " -> " ++ (showT (succ c) $ f (TVar c))
> showT c (TVar i) = "a"++show i
>
> runT :: Term a -> Term a
> runT (TNum d) = (TNum d)
> runT (TLam f) = (TLam f)
> runT (TApp f x) = case runT f of TLam f' -> runT (f' x)
> runT (TVar i) = error (show i)
> \end{code}
> =
> The existential
>
> \begin{code}
> data TermBox = forall a. Typeable a => TermBox (Term a)
> appTermBox :: (forall a. Typeable a => Term a -> res) -> TermBox -> res
> appTermBox f (TermBox x) = f x
> \end{code}
> =
> The typechecker returns a box *and* a typeRep.
>
> Cast is the usual trick.
>
> \begin{code}
> retTBox :: Typeable a => Term a -> Either String (TermBox, TypeRep)
> retTBox x = return (TermBox $ x, typeOf x)
>
> type Env = M.Map Ident (TermBox, TypeRep)
>
> trmCast :: forall a b c. (Typeable a, Typeable b) =>
> Term a -> Either String (Term b)
> trmCast x = maybe err Right $ gcast x
> where err = Left $ "Type error. Expected: " ++
> show (typeOf (undefined::a))
> ++ ", Inferred: " ++
> show (typeOf (undefined::b))
> \end{code}
>
> =
> \begin{code}
> typeCheck' :: STerm -> Env -> Either String (TermBox, TypeRep)
> typeCheck' t env = go t env 0
> where
> go (SNum d) _ idx = retTBox (TNum d)
> go (SVar i) env idx = do
> (TermBox t, _) <- maybe (fail $ "not in scope: " ++ i)
> return $ M.lookup i env
> retTBox $ t
> \end{code}
>
> Nums and vars are easy.
> =
> App and Lam... less so.
>
> \begin{code}
> go (SApp s1 s2) env idx = do
> (TermBox e1, tr1) <- go s1 env idx
> (TermBox e2, _) <- go s2 env idx
> TermBox rt <- return $ mkTerm $ head $ tail $
> typeRepArgs $ head $ typeRepArgs $ tr1
> -- TypeReps have their... drawbacks.
> e1' <- trmCast e1
> retTBox $ TApp e1' e2 `asTypeOf` rt
> go (SLam i tr s) env idx = do
> TermBox et <- return $ mkTerm tr
> (TermBox e, _) <- go s
> (M.insert i
> (TermBox (TVar idx `asTypeOf` et),tr)
> env
> )
> (idx + 1)
> retTBox $ TLam (\x -> subst (x `asTypeOf` et) idx e)
> \end{code}
> =
> How does mkTerm work?
>
> \begin{code}
> mkTerm :: TypeRep -> TermBox
> mkTerm tr = go tr TermBox
> where
> go :: TypeRep -> (forall a. (Typeable a) => Term a -> res) -> res
> go tr k
> | tr == typeOf (0::Double) = k (TNum 0)
> | typeRepTyCon tr == arrCon =
> go (head $ typeRepArgs tr)
> $ \xt -> go (head $ tail $ typeRepArgs tr)
> $ \y -> k (TLam $ \x -> const y (x `asTypeOf` xt))
>
> arrCon = typeRepTyCon $ typeOf (undefined::Int -> String)
> \end{code}
>
> Same principle -- but can build arrows directly.
>
> Doing so requires staying cps... I think.
> =
> And this is how we get rid of the DeBruijn terms.
>
> \begin{code}
> subst :: (Typeable a) => Term a -> Int -> Term b -> Term b
> subst t i trm = go trm
> where
> go :: Term c -> Term c
> go (TNum d) = (TNum d)
> go (TApp f x) = TApp (go f) (go x)
> go (TLam f) = TLam (\a -> go (f a))
> go (TVar i')
> | i == i' = either error id $ trmCast t
> | otherwise = (TVar i')
> \end{code}
>
> Q: Now you see why DeBruijn is handy -- substitution
> is otherwise a pain.
>
> =
> But note -- all functions must be monotyped.
> This is the simply typed lambda calculus.
>
> How do we represent TLam (\a -> a)?
>
> The masses demand HM polymorphism.
> -
>
> Take 4:
>
> A damn dirty hack.
>
> =
> All hacks begin with Nats.
>
> \begin{code}
> data Z = Z deriving (Show, Typeable)
> data S a = S a deriving (Show, Typeable)
> \end{code}
> =
> typeCheck is almost the same.
>
> \begin{code}
> typeCheck'' :: STerm -> Env -> Either String (TermBox, TypeRep)
> typeCheck'' t env = go t env 0
> where
> go :: STerm -> Env -> Int -> Either String (TermBox, TypeRep)
> go (SNum d) _ idx = retTBox (TNum d)
> go (SVar i) env idx = do
> (TermBox t, _) <- maybe (fail $ "not in scope: " ++ i)
> return $ M.lookup i env
> retTBox $ t
> \end{code}
> =
> \begin{code}
> go (SApp s1 s2) env idx = do
> (TermBox e1, tr1) <- go s1 env idx
> (TermBox e2, _) <- go s2 env idx
> TermBox rt <- unifyAppRet e1 e2
> e1' <- unifyAppFun e1 e2
> retTBox $ TApp e1' e2 `asTypeOf` rt
> go (SLam i tr s) env idx = do
> TermBox et <- return $ mkTerm' $ tr
> (TermBox e, _) <- go s (M.insert i
> (TermBox (TVar idx `asTypeOf` et),tr)
> env)
> (idx + 1)
> retTBox $ TLam (\x -> subst (x `asTypeOf` et) idx e)
> \end{code}
>
> It looks like we just factored on the nasty arrow code.
> =
> mkTerm is almost the same... we just extended it to deal with Nats.
> \begin{code}
> mkTerm' :: TypeRep -> TermBox
> mkTerm' tr = go tr TermBox
> where
> go :: TypeRep -> (forall a. (Typeable a) => Term a -> res) -> res
> go tr k
> | tr == typeOf (0::Double) = k (TNum 0)
> | tr == typeOf Z = k (TVar 0 :: Term Z)
> | typeRepTyCon tr == succCon = go (head $ typeRepArgs tr)
> $ \t -> k $ succTerm t
> | isArr tr =
> go (head $ typeRepArgs tr)
> $ \xt -> go (head $ tail $ typeRepArgs tr)
> $ \y -> k (TLam $ \x -> const y (x `asTypeOf` xt))
> | otherwise = error $ show tr
>
> succCon = typeRepTyCon $ typeOf (S Z)
> succTerm :: Typeable b => Term b -> Term (S b)
> succTerm _ = TVar 0
> \end{code}
> =
> Some utilities
> \begin{code}
> isArr :: TypeRep -> Bool
> isArr x = typeRepTyCon x == (typeRepTyCon $ typeOf (undefined::Int ->
> String))
>
> splitArrCon :: TypeRep -> Either String (TypeRep, TypeRep)
> splitArrCon x
> | isArr x = case typeRepArgs x of
> [a,b] -> Right (a,b)
> _ -> Left $ "Expected function, inferred: " ++ show x
> | otherwise = Left $ "Expected function, inferred: " ++ show x
>
> \end{code}
> =
> Give an arrow term we unify it with its argument...
> and return a witness.
> \begin{code}
> unifyAppRet :: forall a b. (Typeable a, Typeable b) =>
> Term a -> Term b -> Either String TermBox
> unifyAppRet x y = do
> tr <- unifyAppTyps
> (head $ typeRepArgs $ typeOf x)
> (head $ typeRepArgs $ typeOf y)
> return $ mkTerm' tr
>
> \end{code}
> =
> Yes. Actual unification.
> (although this is unification of a type template,
> so at least it is local)
> \begin{code}
> unifyAppTyps :: TypeRep -> TypeRep -> Either String TypeRep
> unifyAppTyps trf trx = do
> (fl,fr) <- splitArrCon trf
> env <- go M.empty fl trx
> subIt env fr
> where
> -- go yields a substitution environment.
> go :: M.Map String TypeRep -> TypeRep ->
> TypeRep -> Either String (M.Map String TypeRep)
> go env x y
> | isFree x = case M.lookup (show x) env of
> Just x' -> if x' == y then return env else Left
> (error "a")
> Nothing -> return $ M.insert (show x) y env
> | isArr x = do
> (lh,rh) <- splitArrCon x
> (lh',rh') <- splitArrCon y
> env' <- go env lh lh'
> go env' rh rh'
> | otherwise = if x == y then return env else Left (error "b")
> \end{code}
> =
> \begin{code}
> -- subIt applies it
> subIt :: M.Map String TypeRep -> TypeRep -> Either String TypeRep
> subIt env x
> | isFree x = case M.lookup (show x) env of
> Just x' -> return x'
> Nothing -> Left (error "c")
> | isArr x = do
> (lh,rh) <- splitArrCon x
> lh' <- subIt env lh
> rh' <- subIt env rh
> return $ mkTyConApp arrCon [lh',rh']
> | otherwise = return x
> succCon = typeRepTyCon $ typeOf (S Z)
> zCon = typeRepTyCon $ typeOf Z
> isFree x = typeRepTyCon x `elem` [zCon, succCon]
> arrCon = (typeRepTyCon $ typeOf (undefined::Int -> String))
> \end{code}
>
> =
> And now, we just have to convince GHC that they unify!
>
> \begin{code}
> unifyAppFun :: forall a b c. (Typeable a, Typeable b, Typeable c)
> => Term a -> Term b -> Either String (Term c)
> unifyAppFun x y = do
> unifyAppTyps (head $ typeRepArgs $ typeOf x) (head $ typeRepArgs $ typeOf
> y)
> return $ unsafeCoerce x
> \end{code}
> =
> Problem solved.
>
>
> On Fri, Jun 25, 2010 at 2:03 PM, Edward Kmett <ekmett at gmail.com> wrote:
>
>> It turns out that defining Read is somewhat tricky to do for a GADT.
>>
>> Gershom Bazerman has some very nice slides on how to survive the process
>> by manual typechecking (much in the spirit of Oleg's meta-typechecking code
>> referenced by Stephen's follow up below)
>>
>> He presented them at hac-phi this time around.
>>
>> I will check with him to see if I can get permission to host them
>> somewhere and post a link to them here.
>>
>> -Edward Kmett
>>
>>
>> On Fri, Jun 25, 2010 at 5:04 AM, <corentin.dupont at ext.mpsa.com> wrote:
>>
>>>
>>> Hello Haskellers,
>>>
>>> I'm having trouble writing a Read Instance for my GATD.
>>> Arg this GATD!! It causes me more problems that it solves ;)
>>> Especially with no automatic deriving, it adds a lot of burden to my
>>> code.
>>>
>>> >data Obs a where
>>> > ProposedBy :: Obs Int -- The player that proposed the tested
>>> rule
>>> > Turn :: Obs Turn -- The current turn
>>> > Official :: Obs Bool -- whereas the tested rule is official
>>> > Equ :: (Eq a, Show a, Typeable a) => Obs a -> Obs a -> Obs
>>> Bool
>>> > Plus :: (Num a) => Obs a -> Obs a -> Obs a
>>> > Time :: (Num a) => Obs a -> Obs a -> Obs a
>>> > Minus :: (Num a) => Obs a -> Obs a -> Obs a
>>> > And :: Obs Bool -> Obs Bool -> Obs Bool
>>> > Or :: Obs Bool -> Obs Bool -> Obs Bool
>>> > Not :: Obs Bool -> Obs Bool
>>> > Konst :: a -> Obs a
>>>
>>>
>>> > instance Read a => Read (Obs a) where
>>> > readPrec = (prec 10 $ do
>>> > Ident "ProposedBy" <- lexP
>>> > return (ProposedBy))
>>> > +++
>>> > (prec 10 $ do
>>> > Ident "Official" <- lexP
>>> > return (Official))
>>> > (etc...)
>>>
>>> Observable.lhs:120:8:
>>> Couldn't match expected type `Int' against inferred type `Bool'
>>> Expected type: ReadPrec (Obs Int)
>>> Inferred type: ReadPrec (Obs Bool)
>>>
>>>
>>> Indeed "ProposedBy" does not have the same type that "Official".
>>> Mmh how to make it all gently mix altogether?
>>>
>>>
>>> Best,
>>> Corentin
>>>
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>
>>
>>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100626/5b88db50/attachment-0001.html
More information about the Haskell-Cafe
mailing list