[Haskell-cafe] Re: Efficient parallel regular expressions

Tue Nov 4 15:10:48 EST 2008

The regex-tdfa package (and regex-posix) implement subexpressions capture.

So if you want to match alpha beta and gamma in parallel you could write

"(alpha)|(beta)|(gamma)" and check which subexpression has the non-empty match.

This becomes slightly complicated if there are parenthesis and captures inside 
alpha beta or gamma.  Then you need to compute the indices that are the top 
level captures.

In particular, the regex-tdfa package (get the latest from 
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-tdfa ) will 
create a DFA and run through the input once without backtracking.  It will find 
the leftmost-longest match, so the order of the branches only matters if there 
is a tie in length.

If you need to be left-biased then you need a perl-style engine, and you can use 
the regex-pcre or pcre-light haskell package and the PCRE library.  These are 
obtainable from Hackage.  I doubt PCRE uses a simple DFA...

Cheers,
   Chris

Martijn van Steenbergen wrote:
> Hello all,
> 
> For my mud client Yogurt (see hackage) I'm currently working on
> improving the efficiency of the hooks. Right now several hooks, each
> consisting of a regex and an action can be active at the same time.
> Every time a line of input is available (usually several times a second)
> I run the line through all the available regexes and execute the first
> matching action.
> 
> I figured this is not the cleverest approach and it'd be better if I
> |'ed all regexes into one big DFA. However, how do I then find out which
> of the original hooks matched and so which action to execute?
> 
> As far as I know there's no way to do that with Text.Regex. Alex looks
> promising but is really only an executable and doesn't offer an API.
> I've also found mr. Jo�o Saraiva's HaLex but I don't know if that was
> meant to be used seriously.
> 
> Does anyone have any experience with this? What's the best way to
> achieve this?
> 
> Thanks much,
> 
> Martijn.