[Haskell-cafe] Efficient parallel regular expressions

roger peppe rogpeppe at gmail.com
Wed Nov 5 09:02:05 EST 2008


On Tue, Nov 4, 2008 at 6:44 PM, i wrote:
> i'm sorry if this is obviously wrong (i haven't used Text.Regex), but
> can't you do this with submatches?

rights or wrongs of regexps aside, i just checked that the above
approach *is* feasible with Text.Regex

here's some code:

>module Multimatch(multimatch) where
>       import Text.Regex
>       import qualified Data.List as DL
>       import qualified Data.Maybe as DM
>
>       brcount :: String -> Int
>       brcount ('\\' : _ : s) =
>               brcount s
>       brcount ('(' : s) =
>               1 + brcount s
>       brcount (_ : s) =
>               brcount s
>       brcount [] =
>               0
>
>       -- given a list of strings representing regular expressions,
>       -- each associated with a tag, match against a string
>       -- and return the match, along with the associated tag,
>       -- or Nothing if there's no match.
>       multimatch :: [(tag, String)] -> (String -> Maybe (tag, String))
>       multimatch rs =
>               let re = mkRegex $ DL.intercalate "|" $ map ((\s -> "(x(" ++ s ++ "))") . snd) rs in
>               let tags = submatches rs in
>               (\ s ->
>                       do
>                               ms <- matchRegex re ("x" ++ s)
>                               (tag, m) <- DL.find (\(_, m) -> not (null m)) (zip tags ms)
>                               return (DM.fromJust tag, tail m))
>       submatches [] =
>               []
>       submatches ((tag, r) : rs) =
>               (Just tag : take (brcount r + 1) (repeat Nothing)) ++ submatches rs

i'm sure there's a more compact implementation in there somewhere:
i'm just a haskell newbie.

 cheers,
   rog.


More information about the Haskell-Cafe mailing list