[Haskell-cafe] Regular Expressions without GADTs
Bruno Oliveira
Bruno.Oliveira at comlab.ox.ac.uk
Tue Oct 18 08:04:39 EDT 2005
Hello,
You can also write this code in Haskell 98. Jeremy Gibbons and I have
recently written a paper entitled "TypeCase: A design pattern for
type-indexed functions" where we explore the use of some techniques from
lightweight approaches to generic programming (Ralf Hinze and James
Cheney's work) for other purposes.
If you are interested you can take a look at:
http://web.comlab.ox.ac.uk/oucl/work/bruno.oliveira/typecase.pdf
Basically, we can use type classes and GADTs as two alternatives approaches
to implement type-based case analysis. A well-known example of this is
printf. With printf we want to construct a format string that determines the
type of printf. Something similar happens here, we want to build a parser based
on the type of the regular expression.
There is nothing ingenious with this solution --- I basically translated
Oleg's and Conor's code. In the paper we discuss in details different
trade-offs between different implementations (e.g. GADT vs type classes).
For instance, here is something you cannot do (easily) with the GADT
approach:
> myParser :: MonadPlus m => Parse m tok Int
> myParser = Parse (\_ -> return (1::Int))
> test = asList (parse myParser "ole")
Basically, with the code that follows you can define your own customized
Regular expressions that you can use with your parser. (Sorry for the
stupid example, but just to give you the idea).
Ultimately you can reason on your programs (of this kind!) using GADTs or
multiple-parameter
type classes with functional dependencies and them translate them into
Haskell 98.
Here is the code:
> module RegExps where
> import Monad
> newtype Zero = Zero Zero -- Zero type in Haskell 98
> class RegExp g where
> zero :: g tok Zero
> one :: g tok ()
> check :: (tok -> Bool) -> g tok tok
> plus :: g tok a -> g tok b -> g tok (Either a b)
> mult :: g tok a -> g tok b -> g tok (a,b)
> push :: tok -> g tok r -> g tok r
> star :: g tok a -> g tok [a]
> newtype Parse m tok t = Parse {parse :: [tok] -> m t}
> instance MonadPlus m => RegExp (Parse m) where
> zero = Parse (\_ -> mzero)
> one = Parse (\l ->
> case l of
> [] -> return ()
> _ -> mzero)
> check p = Parse (\l ->
> case l of
> [t] -> if (p t) then return t else mzero
> _ -> mzero)
> plus r1 r2 = Parse (\l -> (liftM Left $ parse r1 l) `mplus`
> (liftM Right $ parse r2 l))
> push tok r = Parse (\l -> parse r (tok:l))
> mult r1 r2 = Parse (\l ->
> case l of
> [] -> liftM2 (,) (parse r1 l) (parse r2 l)
> (t:ts) -> parse (mult (push t r1) r2) ts `mplus`
> liftM2 (,) (parse r1 ([] `asTypeOf` ts)) (parse r2
(t:ts)))
> star r = Parse (\l ->
> case l of
> [] -> return []
> ts -> parse (mult r (star r)) ts >>= (\(x,xs) -> return
$ x : xs))
Problem with the monomorphism restriction
> p1 :: RegExp g => g Char ([Char], [Char])
> p1 = mult (star (check (== 'a'))) (star (check (== 'b')))
p1 = (Mult (Star (Check (== 'a'))) (Star (Check (== 'b'))))
> asMayBe :: Maybe a -> Maybe a
> asMayBe = id
> asList :: [a] -> [a]
> asList = id
> testp = asMayBe $
> parse (star (mult (star (check (== 'a'))) (star (check (==
'b')))))
> "abaabaaabbbb"
*RX> testp
Just [("a","b"),("aa","b"),("aaa","bbbb")]
-- see alternative parses
> testp1 = take 3 $ asList $
> parse (star (mult (star (check (== 'a'))) (star (check (==
'b')))))
> "abaabaaabbbb"
-- Parsing the list of integers
> ieven = even :: Int->Bool
> iodd = odd :: Int->Bool
> p2 :: RegExp g => g Int (Either (Int, (Int, [Int])) (Int, [Int]))
> p2 = plus (mult (check iodd) (mult (check iodd) (star (check ieven))))
> (mult (check ieven) (star (check iodd)))
-- the parser is ambiguous. We can see the alternatives
> test2 = take 3 $ asList $ parse (star p2) [1::Int,1,2,3,3,4]
Connor's code for empty.
> {-
> newtype Empty tok t = Empty {empty :: Maybe t}
> instance RegExp Empty where
> zero = Empty mzero
> one = Empty (return ())
> check _ = Empty mzero
> plus r1 r2 = Empty ((return Left `ap` empty r1) `mplus`
> (return Right `ap` empty r2))
> mult r1 r2 = Empty (return (,) `ap` empty r1 `ap` empty r2)
> star _ = Empty (return [])
> -}
More information about the Haskell-Cafe
mailing list