[Haskell-cafe] Regular Expressions, was Re: Interest in helping w/ Haskell standard

Martin Sulzmann sulzmann at comp.nus.edu.sg
Tue Oct 18 02:44:35 EDT 2005


Semantic subtyping issue:

Assume we have a function f of type f :: Reg (r*) -> ...
to which we pass a value x of type Reg (r,r*).
We have that (r,r*) is a semantic subtype of r*,
hence, the code f x is accepted in languages such as XDuce/CDuce.

I'm just saying that the fact regexp can be represented via GADTs
doesn't imply that we get the same expressive power of
languages such as XDuce/CDuce.

Martin


Hongwei Xi writes:
 > Hi Martin:
 > 
 > Thanks for the message.
 > 
 > No, I am not on the list.
 > 
 > Back then, we did something similar but for a different
 > purpose. The idea is like this:
 > 
 > We wanted to represent an XML document as a pair:
 > 
 > XML (r) * Reg (r)
 > 
 > where r stands for some regexp and Reg(r) is a
 > singleton type. When performing search over XML(r),
 > we can use Reg(r) as some kind of pilot value
 > (which is a lot smaller than XML(r)) to guide the
 > search. For instance, it is often easy to tell from
 > Reg(r) that an item to be found is not in XML(r) and
 > thus we can skip searching XML(r) entirely.
 > This is sort of like indexing scheme in database.
 > 
 > BTW, I am not sure what kind of 'semantic typing'
 > you have in mind. An example?
 > 
 > --Hongwei
 > 
 > Computer Science Department
 > Boston University
 > 111 Cummington Street
 > Boston, MA 02215
 > 
 > Email: hwxi at cs.bu.edu
 > Url: http://www.cs.bu.edu/~hwxi
 > Tel: +1 617 358 2511 (office)
 > Fax: +1 617 353 6457 (department)
 > 
 > 
 > 
 > On Mon, 17 Oct 2005, Martin Sulzmann wrote:
 > 
 > >>
 > >>Very interesting Conor. Do you know Xi et al's APLAS'03 paper?
 > >>(Hongwei, I'm not sure whether you're on this list).
 > >>Xi et al. use GRDTs (aka GADTs aka first-class phantom types)
 > >>to represent XML documents. There're may be some connections
 > >>between what you're doing and Xi et al's work.
 > >>
 > >>I believe that there's an awful lot you can  do with GADTs 
 > >>(in the context of regular expressions). But there're two things 
 > >>you can't accomplish:
 > >>semantic subtyping and regular expression pattern matching
 > >>a la XDuce/CDuce. Unless somebody out there proves me wrong.
 > >>
 > >>Martin
 > >>
 > >>
 > >>
 > >>Hi folks,
 > >>
 > >>Inspired by Ralf's post, I thought I'd just GADTize a dependently typed=20
 > >>program I wrote in 2001.
 > >>
 > >>Wolfgang Jeltsch wrote:
 > >>
 > >>> Now lets consider using an algebraic datatype for regexps:
 > >>>
 > >>>	data RegExp
 > >>>		=3D Empty | Single Char | RegExp :+: RegExp | RegExp :|: RegExpt | Ite=
 > >>r RegExp
 > >>>
 > >>>Manipulating regular expressions now becomes easy and safe =E2=80=93 you=
 > >> are just not=20
 > >>>able to create "syntactically incorrect regular expressions" since durin=
 > >>g=20
 > >>>runtime you don't deal with syntax at all.
 > >>> =20
 > >>>
 > >>
 > >>A fancier variation on the same theme...
 > >>
 > >> > data RegExp :: * -> * -> * where
 > >> >   Zero   :: RegExp tok Empty
 > >> >   One    :: RegExp tok ()
 > >> >   Check  :: (tok -> Bool) -> RegExp tok tok
 > >> >   Plus   :: RegExp tok a -> RegExp tok b -> RegExp tok (Either a b)
 > >> >   Mult   :: RegExp tok a -> RegExp tok b -> RegExp tok (a, b)
 > >> >   Star   :: RegExp tok a -> RegExp tok [a]
 > >>
 > >> > data Empty
 > >>
 > >>The intuition is that a RegExp tok output is a regular expression=20
 > >>explaining how to parse a list of tok as an output. Here, Zero is the=20
 > >>regexp which does not accept anything, One accepts just the empty=20
 > >>string, Plus is choice and Mult is sequential composition; Check lets=20
 > >>you decide whether you like a single token.
 > >>
 > >>Regular expressions may be seen as an extended language of polynomials=20
 > >>with tokens for variables; this parser works by repeated application of=20
 > >>the remainder theorem.
 > >>
 > >> > parse :: RegExp tok x -> [tok] -> Maybe x
 > >> > parse r []       =3D empty r
 > >> > parse r (t : ts) =3D case divide t r of
 > >> >   Div q f -> return f `ap` parse q ts
 > >>
 > >>Example
 > >>
 > >>*RegExp> parse (Star (Mult (Star (Check (=3D=3D 'a'))) (Star (Check (=3D=3D=
 > >>=20
 > >>'b'))))) "abaabaaabbbb"
 > >>Just [("a","b"),("aa","b"),("aaa","bbbb")]
 > >>
 > >>The 'remainder' explains if a regular expression accepts the empty=20
 > >>string, and if so, how. The Star case is a convenient=20
 > >>underapproximation, ruling out repeated empty values.
 > >>=20
 > >> > empty :: RegExp tok a -> Maybe a
 > >> > empty Zero         =3D mzero
 > >> > empty One          =3D return ()
 > >> > empty (Check _)    =3D mzero
 > >> > empty (Plus r1 r2) =3D (return Left  `ap` empty r1) `mplus`
 > >> >                      (return Right `ap` empty r2)
 > >> > empty (Mult r1 r2) =3D return (,) `ap` empty r1 `ap` empty r2
 > >> > empty (Star _)     =3D return []
 > >>
 > >>The 'quotient' explains how to parse the tail of the list, and how to=20
 > >>recover the meaning of the whole list from the meaning of the tail.
 > >>
 > >> > data Division tok x =3D forall y. Div (RegExp tok y) (y -> x)
 > >>
 > >>Here's how it's done. I didn't expect to need scoped type variables, but=20
 > >>I did...
 > >>
 > >> > divide :: tok -> RegExp tok x -> Division tok x
 > >> > divide t Zero                  =3D Div Zero naughtE
 > >> > divide t One                   =3D Div Zero naughtE
 > >> > divide t (Check p) | p t       =3D Div One (const t)
 > >> >                    | otherwise =3D Div Zero naughtE
 > >> > divide t (Plus (r1 :: RegExp tok a) (r2 :: RegExp tok b)) =3D
 > >> >   case (divide t r1, divide t r2) of
 > >> >     (Div (q1 :: RegExp tok a') (f1 :: a' -> a),
 > >> >       Div (q2 :: RegExp tok b') (f2 :: b' -> b)) ->
 > >> >       Div (Plus q1 q2) (f1 +++ f2)
 > >> > divide t (Mult r1 r2) =3D case (empty r1, divide t r1, divide t r2) of
 > >> >   (Nothing, Div q1 f1, _) -> Div (Mult q1 r2) (f1 *** id)
 > >> >   (Just x1, Div q1 f1, Div q2 f2) ->
 > >> >     Div (Plus (Mult q1 r2) q2) (either (f1 *** id) (((,) x1) . f2))
 > >> > divide t (Star r) =3D case (divide t r) of
 > >> >   Div q f -> Div (Mult q (Star r)) (\ (y, xs) -> (f y : xs))
 > >>
 > >>Bureaucracy.
 > >>
 > >> > (***) :: (a -> b) -> (c -> d) -> (a, c) -> (b, d)
 > >> > (f *** g) (a, c) =3D (f a, g c)
 > >>
 > >> > (+++) :: (a -> b) -> (c -> d) -> Either a c -> Either b d
 > >> > (f +++ g) (Left a)  =3D Left  (f a)
 > >> > (f +++ g) (Right c) =3D Right (g c)
 > >>
 > >> > naughtE :: Empty -> x
 > >> > naughtE =3D undefined
 > >>
 > >>It's not the most efficient parser in the world (doing some algebraic=20
 > >>simplification on the fly wouldn't hurt), but it shows the sort of stuff=20
 > >>you can do.
 > >>
 > >>Have fun
 > >>
 > >>Conor
 > >>_______________________________________________
 > >>Haskell-Cafe mailing list
 > >>Haskell-Cafe at haskell.org
 > >>http://www.haskell.org/mailman/listinfo/haskell-cafe
 > >>


More information about the Haskell-Cafe mailing list