[Haskell-cafe] Regular Expressions, was Re: Interest in helping
w/ Haskell standard
Martin Sulzmann
sulzmann at comp.nus.edu.sg
Tue Oct 18 02:44:35 EDT 2005
Semantic subtyping issue:
Assume we have a function f of type f :: Reg (r*) -> ...
to which we pass a value x of type Reg (r,r*).
We have that (r,r*) is a semantic subtype of r*,
hence, the code f x is accepted in languages such as XDuce/CDuce.
I'm just saying that the fact regexp can be represented via GADTs
doesn't imply that we get the same expressive power of
languages such as XDuce/CDuce.
Martin
Hongwei Xi writes:
> Hi Martin:
>
> Thanks for the message.
>
> No, I am not on the list.
>
> Back then, we did something similar but for a different
> purpose. The idea is like this:
>
> We wanted to represent an XML document as a pair:
>
> XML (r) * Reg (r)
>
> where r stands for some regexp and Reg(r) is a
> singleton type. When performing search over XML(r),
> we can use Reg(r) as some kind of pilot value
> (which is a lot smaller than XML(r)) to guide the
> search. For instance, it is often easy to tell from
> Reg(r) that an item to be found is not in XML(r) and
> thus we can skip searching XML(r) entirely.
> This is sort of like indexing scheme in database.
>
> BTW, I am not sure what kind of 'semantic typing'
> you have in mind. An example?
>
> --Hongwei
>
> Computer Science Department
> Boston University
> 111 Cummington Street
> Boston, MA 02215
>
> Email: hwxi at cs.bu.edu
> Url: http://www.cs.bu.edu/~hwxi
> Tel: +1 617 358 2511 (office)
> Fax: +1 617 353 6457 (department)
>
>
>
> On Mon, 17 Oct 2005, Martin Sulzmann wrote:
>
> >>
> >>Very interesting Conor. Do you know Xi et al's APLAS'03 paper?
> >>(Hongwei, I'm not sure whether you're on this list).
> >>Xi et al. use GRDTs (aka GADTs aka first-class phantom types)
> >>to represent XML documents. There're may be some connections
> >>between what you're doing and Xi et al's work.
> >>
> >>I believe that there's an awful lot you can do with GADTs
> >>(in the context of regular expressions). But there're two things
> >>you can't accomplish:
> >>semantic subtyping and regular expression pattern matching
> >>a la XDuce/CDuce. Unless somebody out there proves me wrong.
> >>
> >>Martin
> >>
> >>
> >>
> >>Hi folks,
> >>
> >>Inspired by Ralf's post, I thought I'd just GADTize a dependently typed=20
> >>program I wrote in 2001.
> >>
> >>Wolfgang Jeltsch wrote:
> >>
> >>> Now lets consider using an algebraic datatype for regexps:
> >>>
> >>> data RegExp
> >>> =3D Empty | Single Char | RegExp :+: RegExp | RegExp :|: RegExpt | Ite=
> >>r RegExp
> >>>
> >>>Manipulating regular expressions now becomes easy and safe =E2=80=93 you=
> >> are just not=20
> >>>able to create "syntactically incorrect regular expressions" since durin=
> >>g=20
> >>>runtime you don't deal with syntax at all.
> >>> =20
> >>>
> >>
> >>A fancier variation on the same theme...
> >>
> >> > data RegExp :: * -> * -> * where
> >> > Zero :: RegExp tok Empty
> >> > One :: RegExp tok ()
> >> > Check :: (tok -> Bool) -> RegExp tok tok
> >> > Plus :: RegExp tok a -> RegExp tok b -> RegExp tok (Either a b)
> >> > Mult :: RegExp tok a -> RegExp tok b -> RegExp tok (a, b)
> >> > Star :: RegExp tok a -> RegExp tok [a]
> >>
> >> > data Empty
> >>
> >>The intuition is that a RegExp tok output is a regular expression=20
> >>explaining how to parse a list of tok as an output. Here, Zero is the=20
> >>regexp which does not accept anything, One accepts just the empty=20
> >>string, Plus is choice and Mult is sequential composition; Check lets=20
> >>you decide whether you like a single token.
> >>
> >>Regular expressions may be seen as an extended language of polynomials=20
> >>with tokens for variables; this parser works by repeated application of=20
> >>the remainder theorem.
> >>
> >> > parse :: RegExp tok x -> [tok] -> Maybe x
> >> > parse r [] =3D empty r
> >> > parse r (t : ts) =3D case divide t r of
> >> > Div q f -> return f `ap` parse q ts
> >>
> >>Example
> >>
> >>*RegExp> parse (Star (Mult (Star (Check (=3D=3D 'a'))) (Star (Check (=3D=3D=
> >>=20
> >>'b'))))) "abaabaaabbbb"
> >>Just [("a","b"),("aa","b"),("aaa","bbbb")]
> >>
> >>The 'remainder' explains if a regular expression accepts the empty=20
> >>string, and if so, how. The Star case is a convenient=20
> >>underapproximation, ruling out repeated empty values.
> >>=20
> >> > empty :: RegExp tok a -> Maybe a
> >> > empty Zero =3D mzero
> >> > empty One =3D return ()
> >> > empty (Check _) =3D mzero
> >> > empty (Plus r1 r2) =3D (return Left `ap` empty r1) `mplus`
> >> > (return Right `ap` empty r2)
> >> > empty (Mult r1 r2) =3D return (,) `ap` empty r1 `ap` empty r2
> >> > empty (Star _) =3D return []
> >>
> >>The 'quotient' explains how to parse the tail of the list, and how to=20
> >>recover the meaning of the whole list from the meaning of the tail.
> >>
> >> > data Division tok x =3D forall y. Div (RegExp tok y) (y -> x)
> >>
> >>Here's how it's done. I didn't expect to need scoped type variables, but=20
> >>I did...
> >>
> >> > divide :: tok -> RegExp tok x -> Division tok x
> >> > divide t Zero =3D Div Zero naughtE
> >> > divide t One =3D Div Zero naughtE
> >> > divide t (Check p) | p t =3D Div One (const t)
> >> > | otherwise =3D Div Zero naughtE
> >> > divide t (Plus (r1 :: RegExp tok a) (r2 :: RegExp tok b)) =3D
> >> > case (divide t r1, divide t r2) of
> >> > (Div (q1 :: RegExp tok a') (f1 :: a' -> a),
> >> > Div (q2 :: RegExp tok b') (f2 :: b' -> b)) ->
> >> > Div (Plus q1 q2) (f1 +++ f2)
> >> > divide t (Mult r1 r2) =3D case (empty r1, divide t r1, divide t r2) of
> >> > (Nothing, Div q1 f1, _) -> Div (Mult q1 r2) (f1 *** id)
> >> > (Just x1, Div q1 f1, Div q2 f2) ->
> >> > Div (Plus (Mult q1 r2) q2) (either (f1 *** id) (((,) x1) . f2))
> >> > divide t (Star r) =3D case (divide t r) of
> >> > Div q f -> Div (Mult q (Star r)) (\ (y, xs) -> (f y : xs))
> >>
> >>Bureaucracy.
> >>
> >> > (***) :: (a -> b) -> (c -> d) -> (a, c) -> (b, d)
> >> > (f *** g) (a, c) =3D (f a, g c)
> >>
> >> > (+++) :: (a -> b) -> (c -> d) -> Either a c -> Either b d
> >> > (f +++ g) (Left a) =3D Left (f a)
> >> > (f +++ g) (Right c) =3D Right (g c)
> >>
> >> > naughtE :: Empty -> x
> >> > naughtE =3D undefined
> >>
> >>It's not the most efficient parser in the world (doing some algebraic=20
> >>simplification on the fly wouldn't hurt), but it shows the sort of stuff=20
> >>you can do.
> >>
> >>Have fun
> >>
> >>Conor
> >>_______________________________________________
> >>Haskell-Cafe mailing list
> >>Haskell-Cafe at haskell.org
> >>http://www.haskell.org/mailman/listinfo/haskell-cafe
> >>
More information about the Haskell-Cafe
mailing list