[Haskell-cafe] Regular Expressions without GADTs

Wed Oct 19 00:39:21 EDT 2005

Conor McBride wrote:
> Neither Oleg nor Bruno translated my code; they threw away my
> structurally recursive on-the-fly automaton and wrote combinator parsers
> instead. That's why there's no existential, etc. The suggestion that
> removing the GADT simplifies the code would be better substantiated if
> like was compared with like.
...

> I'm sure the program I actually wrote can be expressed with the type
> class trick, just by cutting up my functions and pasting the bits into
> individual instances; the use of the existential is still available. I
> don't immediately see how to code this up in Bruno's style, but that
> doesn't mean it can't be done. Still, it might be worth comparing like
> with like.

Please see the enclosed code. It is still in Haskell98 -- and works in
Hugs. 

> I suspect that once you start producing values with the
> relevant properties (as I do) rather than just consuming them (as Oleg
> and Bruno do), the GADT method might work out a little neater.

Actually, the code is pretty much your original code, with downcased
identifiers. It faithfully implements that parser division
approach. Still, there are no existentials. I wouldn't say that GADT
code is so much different. Perhaps the code below is a bit neater due
to the absence of existentials, `case' statements, and local type
annotations.

{-- Haskell98! --}

module RegExps where

import Monad

newtype Zero = Zero Zero -- Zero type in Haskell 98

-- Bruno.Oliveira's type class
class RegExp g where
    zero   :: g tok Zero
    one    :: g tok ()
    check  :: (tok -> Bool) -> g tok tok
    plus   :: g tok a -> g tok b -> g tok (Either a b)
    mult   :: g tok a -> g tok b -> g tok (a,b)
    star   :: g tok a -> g tok [a]

data Parse tok t = Parse { empty :: Maybe t
			 , divide :: tok -> Parse tok t}

parse :: Parse tok a -> [tok] -> Maybe a
parse p [] = empty p
parse p (t:ts) = parse (divide p t) ts

liftP f p = Parse{empty  = liftM f (empty p),
		  divide = \tok -> liftP f (divide p tok)}

liftP2 mf p1 p2 = Parse{empty = mf (empty p1) (empty p2),
			divide = \tok -> liftP2 mf (divide p1 tok) 
		                                   (divide p2 tok)}

lsum x y  = (liftM Left x) `mplus` (liftM Right y)
lprod x y = liftM2 (,) x y

-- Conor McBride's parser division algorithm

instance RegExp Parse  where
    zero        = Parse mzero (\_ -> zero)
    one         = Parse (return ()) (\_ -> liftP (const ()) zero)
    check p     = Parse mzero (\t -> if p t 
			                then liftP (const t) one
			                else liftP (const t) zero)

    plus r1 r2  = Parse (lsum (empty r1) (empty r2))
		        (\t -> plus (divide r1 t) (divide r2 t))

    mult r1 r2  = Parse (lprod (empty r1) (empty r2))
                        (\t -> let (q1,q2) = (divide r1 t, divide r2 t)
			           lp x1 = liftP (either id ((,) x1))
			       in maybe 
			           (mult q1 r2)
			           (\x1 -> lp x1 (plus (mult q1 r2) q2))
				   (empty r1))

    star r      = Parse (return [])
		        (\t-> liftP (uncurry (:)) (mult (divide r t) (star r)))

p1 :: RegExp g => g Char ([Char], [Char])
p1 = mult (star (check (== 'a'))) (star (check (== 'b')))

testp = parse (star (mult (star (check (== 'a'))) (star (check (== 'b')))))
        "abaabaaabbbb"
{-
*RX> testp
Just [("a","b"),("aa","b"),("aaa","bbbb")]

-}

testc = parse (star one) "abracadabra"

-- Parsing the list of integers

ieven = even :: Int->Bool
iodd  = odd  :: Int->Bool

p2 :: RegExp g => g Int (Either (Int, (Int, [Int])) (Int, [Int]))
p2 = plus (mult (check iodd) (mult (check iodd) (star (check ieven))))
	    (mult (check ieven) (star (check iodd)))

test2 = parse (star p2) [1::Int,1,2,3,3,4]