Announcing Text.Regex.Lazy (0.33)
haskell at list.mightyreason.com
Tue Mar 21 15:48:04 EST 2006
Announcing : Text.Regex.Lazy (0.33)
Where : http://sourceforge.net/projects/lazy-regex
Who : Chris Kuklewicz <haskell at list.mightyreason.com>
License : BSD, except for DFAEngine.hs which is LGPL (derived from CTK light)
What: This is an alternative to Text.Regex along with some enhancements. GHC's
Text.Regex marshals the data back and forth to c-arrays to call libc and this is
far too slow (and strict). This module understands regular expression Strings
via a Parsec parser and creates an internal data structure
(Text.Regex.Lazy.Pattern). This is then transformed into a Parsec parser to
process the input String, or into a DFA table for matching against the input
String or FastPackedString. The input string is consumed lazily, so it may be
an arbitrarily long or infinite source.
The main modules of interest are:
(*) Text.Regex.Lazy.Compat is supposed to be a drop in replacement for
Text.Regex which uses Parsec and lazy matching.
(*) Text.Regex.Lazy.Full allows for different strategies and for expanded syntax
. This uses Parsec and a choice of lazy or strict matching.
(*) Text.Regex.Lazy.CompatDFA uses a fast lazy DFAEngine for regex matching.
(*) And an early version of Text.Regex.Lazy.DFAEngineFPS applies the DFAEngine
to a Data.FastPackedString (untested).
Why might you use this?
(+) You would rather not translate a regular expression into
a hard coded predicate for matching or filtering a string.
(+) You can parse something via parenthesized subgroup capture of a regular
expression and not have to write all the parsec manually.
(+) You need to filter a large input (from stdin, for example) with a regex and
want to do it lazily instead of all at once.
(+) You want to build your own extensions to regex syntax for your project and
would rather not have to rewrite one of the c-libraries to do it.
What might you contribute?
(.) Use it and report rough edges, incompatibilities, and bugs.
(.) You can think of clever analysis and optimizations (e.g. taking the Pattern
data as input).
(.) Your favorite extended syntax (e.g. giving meaning to various backslashed
(.) You have a sinister regular expression & input to contribute as an evil test
More information about the Libraries