[Haskell-cafe] Irrefutable pattern love and new regex engine.
Derek Elkins
derek.a.elkins at gmail.com
Tue Jan 22 11:59:48 EST 2008
On Tue, 2008-01-22 at 11:55 -0500, Michael Speer wrote:
> I've been using the creation of a regular expression engine as an
> ongoing project to learn Haskell. Last night I created the newest
> iteration.
>
> My love for the irrefutable pattern can be found in the definition of
> the rexn ( apply repetition to extracted nodes ) function below.
> /snip
> rexn ns pps = let ( ~( xs , rps ) ,
> ~( ~( nxs ) ,
> ~( rxs , rrps ) ) ) = ( exn nxs pps ,
~nxs is redundant.
> Beautiful, beautiful lazy execution and irrefutable pattern matching.
> The data flow is able to easily twist in and out of functions, even as
> an argument to the function call that creates it.
>
> Is this kind of usage normal?
No. Irrefutable patterns are usually the right thing to use in circular
programs, but usually you only need a few well-placed irrefutable
patterns. I suspect most (all?) of yours are unnecessary. The use of
tuples also feels a bit excessive as an impression. I'd have to really
look to decide, but it gives me the feeling that a data structure should
be being used somewhere. Your many unnecessary parentheses bothers me,
but I tend to dislike any unnecessary ones except where there might
reasonably be some ambiguity while reading. I also recommend trying a
different way of laying out your code as most of it is wrapped below and
most of -that- is due to it being excessively indented. Perhaps pull
some of the lambdas into functions.
>
> Any and all comments, suggestions, refutations, and criticisms are welcome.
>
> -- regular expression engine -- (c) 2008 michael speer
>
> import Char ( isSpace )
> -- import Debug.Trace ( trace )
>
> xor :: Bool -> Bool -> Bool
> xor True a = not a
> xor False a = a
>
> data RxToken = RxStart -- matchable start of target string
> | RxChar Char -- a literal character to match
> | RxBound -- inserted wherever alphanums touch whitespace
> | RxEnd -- matchable end of target string
> | RxEOF -- an additional token to push through to
> catch anything trying for RxEnd
> -- RxEOF is never matched.
> deriving ( Show )
>
> rxTokenize tts = RxStart : case tts of
> [] -> RxEnd : RxEOF : []
> tts@(t:_) -> case not $ isSpace t of
> True -> RxBound : rxt tts
> False -> rxt tts
> where
> rxt (t:[]) | not $ isSpace t = RxChar t : RxBound : RxEnd : RxEOF : []
> | otherwise = RxChar t : RxEnd : RxEOF : []
> rxt (t:ts@(t':_)) | isSpace t `xor` isSpace t' = RxChar t :
> RxBound : rxt ts
> | otherwise = RxChar t : rxt ts
>
> data RxTransform = RxTransform ( RxNode -> RxToken -> [ RxNode ] )
> | RxNullTransform
>
> data RxNode = RxActive { rxTransforms :: [RxTransform] ,
> rxMatched :: String ,
> rxNumSubs :: Integer ,
> rxSubExprs :: [ String ] }
> | RxComplete { rxMatched :: String ,
> rxNumSubs :: Integer ,
> rxSubExprs :: [ String ] }
>
> instance Show RxNode where
> show (RxComplete matched _ _) = "<rx|matched:[" ++ matched ++ "]>"
>
> data RxDepth = RxTop | RxSub deriving ( Show )
>
> rxCompile pps = let ( xs , rps ) = oexn [success] RxTop pps
> in case length rps of
> 0 -> RxActive { rxTransforms = xs ,
> rxMatched = [] ,
> rxNumSubs = 0 ,
> rxSubExprs = [] }
> _ -> error $ "Not all of pattern consumed :
> remains : " ++ rps
> where
> -- or together different expression sections -- (a|b|c)
> oexn ns RxTop [] = ( ns , [] )
> oexn _ RxSub [] = error "Pattern ended while still in sub expression"
> oexn ns d pps = let ( ~( xs , rps ) ,
> ~( nxs , nrps ) ) = ( aexn ns pps ,
> case rps of
> ('|':rr) ->
> let ( inxs , irps ) = oexn ns d rr
> in
> ( xs ++ inxs , irps )
> (')':rr) -> case d of
>
> RxTop -> error "Erroneous close parenthesis in pattern "
>
> RxSub -> ( xs , rr )
> [] -> case d of
>
> RxTop -> ( xs , [] )
>
> RxSub -> error "End of pattern while still in sub expression" )
> in ( nxs , nrps )
> -- and together extracted nodes in a given expression segment -- abd?dfs
> aexn ns pps = let ( ~( xs , rps ) ,
> ~( nxs , nrps ) ) = ( rexn nxs pps ,
> case rps of
> ('|':_) -> ( ns , rps )
> (')':_) -> ( ns , rps )
> [] -> ( ns , rps )
> _ -> aexn ns rps )
> in ( xs , nrps )
> -- replication application - weee!
> rexn ns pps = let ( ~( xs , rps ) ,
> ~( ~( nxs ) ,
> ~( rxs , rrps ) ) ) = ( exn nxs pps ,
> case rps of
> ('?':'?':rr)
> -> ( ( ns ) ,
>
> ( ns ++ xs , rr ) )
> ('?':rr)
> -> ( ( ns ) ,
>
> ( xs ++ ns , rr ) )
> ('*':'?':rr)
> -> ( ( ns ++ xs ) ,
>
> ( ns ++ xs , rr ) )
> ('*':rr)
> -> ( ( xs ++ ns ) ,
>
> ( xs ++ ns , rr ) )
> ('+':'?':rr)
> -> ( ( ns ++ xs ) ,
>
> ( xs , rr ) )
> ('+':rr)
> -> ( ( xs ++ ns ) ,
>
> ( xs , rr ) )
> _
> -> ( ( ns ) ,
>
> ( xs , rps ) )
> )
> in
> ( rxs , rrps )
> -- extract node ( including an entire subexpression as a single node )
> exn _ ('?':_) = error "Bad question mark operator"
> exn _ ('*':_) = error "Bad splat operator"
> exn _ ('+':_) = error "Bad plus sign operator"
> exn ns ('(':ps) = oexn ns RxSub ps
> exn ns (p:ps) = ( [ RxTransform ( \rxn k -> case k of
> (RxChar c) -> if c == p
> then
>
> [ RxActive { rxTransforms = ns ,
>
> rxMatched = c : rxMatched rxn ,
>
> rxNumSubs = rxNumSubs rxn ,
>
> rxSubExprs = rxSubExprs rxn } ]
> else
> []
> (RxStart) -> [rxn]
> (RxBound) -> [rxn]
> _ -> []
> ) ] ,
> ps )
>
> exn ns [] = error "can this be reached?"
>
> success = RxTransform ( \ rxn k -> [ RxComplete { rxMatched =
> reverse $ rxMatched rxn ,
> rxNumSubs =
> rxNumSubs rxn ,
> rxSubExprs =
> map reverse $ rxSubExprs rxn } ] )
>
>
> rxExec n tts = iexec [n] (rxTokenize tts)
> where
> iexec (win@(RxComplete _ _ _ ):_) _ = Just win
> iexec [] _ = Nothing
> iexec nns (k:ks) = iexec ( concatMap (
> \n -> case n of
>
> (RxComplete _ _ _) -> [n]
>
> a@(RxActive _ _ _ _) -> concatMap (\xf -> case xf of
>
> (RxTransform f) ->
> f a k
>
> (RxNullTransform) ->
> []
>
> ) (rxTransforms a) ) nns ) ks
>
> main = do
> print $ rxTokenize "this is a test"
> print $ rxExec (rxCompile "hello|world") "hello"
> print $ rxExec (rxCompile "hello|world") "world"
> print $ rxExec (rxCompile "abcde|ab") "abcd"
> print $ rxExec (rxCompile "ab?c") "ac"
> print $ rxExec (rxCompile "ab?c") "abc"
> print $ rxExec (rxCompile "ab*c") "ac"
> print $ rxExec (rxCompile "ab*c") "abc"
> print $ rxExec (rxCompile "ab*c") "abbbc"
> print $ rxExec (rxCompile "ab+c") "ac"
> print $ rxExec (rxCompile "ab+c") "abc"
> print $ rxExec (rxCompile "ab+c") "abbbbbc"
> print $ rxExec (rxCompile "(a|b)+") "aaabbbabaaababaaabbbabbaba"
> print $ rxExec (rxCompile "abc|") "zyx"
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list