ANN: bug fix for regex-tdfa, version 0.97.4 (and "regex-ast")
ChrisK
haskell at list.mightyreason.com
Tue Feb 24 06:57:34 EST 2009
Hello,
The regex-tdfa package has had a series of bug fix releases (0.97.1 and 2 and
3 and now 4). This 0.97.4 releases finishes fixing the bug that was only mostly
fixed in the 0.97.1 release.
An example of the fixed bug: Apply the regex pattern (BB(B?))+(B?) to the
text BBBB. The "BB" in the pattern should be used twice and both "B?" should
match nothing. My code grouped the "+" wrong and matched the "BB" once and then
both the "B?" matched a "B".
The case fixed here was not initially caught because of how I search for
unknown bugs. I use "Arbitrary" from QuickCheck to generate random patterns and
strings to search, and compare regex-tdfa to another POSIX engine.
Because I am on OS X, I am limited by the the native POSIX libraries bugs:
this bug in regex-tdfa was triggered only when the native POSIX was also buggy.
But the source of most of my unit tests is AT&T research [1], and they have a
"libast" with a POSIX implementation. I have adapted my regex-* wrapper
packages to make a "regex-ast" Haskell interface, but the difficulties with the
AT&T headers prevent me from releasing this on hackage. This "regex-ast" has
given me access to a less buggy POSIX back-end, and randomized testing has led
to catching the bug fixed here (as well as a few bug reports back to AT&T).
So while regex-tdfa will not win many speed contests, it is the only POSIX
regular expression library I have running that passes all the unit tests.
[1] http://www.research.att.com/sw/download/
http://www.research.att.com/~gsf/testregex/
http://www.research.att.com/~gsf/testregex/re-interpretation.html
More information about the Libraries
mailing list