[Haskell-cafe] Re: Efficient parallel regular expressions
ChrisK
haskell at list.mightyreason.com
Tue Nov 4 15:10:48 EST 2008
The regex-tdfa package (and regex-posix) implement subexpressions capture.
So if you want to match alpha beta and gamma in parallel you could write
"(alpha)|(beta)|(gamma)" and check which subexpression has the non-empty match.
This becomes slightly complicated if there are parenthesis and captures inside
alpha beta or gamma. Then you need to compute the indices that are the top
level captures.
In particular, the regex-tdfa package (get the latest from
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-tdfa ) will
create a DFA and run through the input once without backtracking. It will find
the leftmost-longest match, so the order of the branches only matters if there
is a tie in length.
If you need to be left-biased then you need a perl-style engine, and you can use
the regex-pcre or pcre-light haskell package and the PCRE library. These are
obtainable from Hackage. I doubt PCRE uses a simple DFA...
Cheers,
Chris
Martijn van Steenbergen wrote:
> Hello all,
>
> For my mud client Yogurt (see hackage) I'm currently working on
> improving the efficiency of the hooks. Right now several hooks, each
> consisting of a regex and an action can be active at the same time.
> Every time a line of input is available (usually several times a second)
> I run the line through all the available regexes and execute the first
> matching action.
>
> I figured this is not the cleverest approach and it'd be better if I
> |'ed all regexes into one big DFA. However, how do I then find out which
> of the original hooks matched and so which action to execute?
>
> As far as I know there's no way to do that with Text.Regex. Alex looks
> promising but is really only an executable and doesn't offer an API.
> I've also found mr. Jo�o Saraiva's HaLex but I don't know if that was
> meant to be used seriously.
>
> Does anyone have any experience with this? What's the best way to
> achieve this?
>
> Thanks much,
>
> Martijn.
More information about the Haskell-Cafe
mailing list