ANN: bug fix for regex-tdfa, version 0.97.4 (and "regex-ast")

ChrisK haskell at list.mightyreason.com
Tue Feb 24 06:57:34 EST 2009


Hello,

   The regex-tdfa package has had a series of bug fix releases (0.97.1 and 2 and 
3 and now 4).  This 0.97.4 releases finishes fixing the bug that was only mostly 
fixed in the 0.97.1 release.

   An example of the fixed bug: Apply the regex pattern (BB(B?))+(B?) to the 
text BBBB.  The "BB" in the pattern should be used twice and both "B?" should 
match nothing.  My code grouped the "+" wrong and matched the "BB" once and then 
both the "B?" matched a "B".

   The case fixed here was not initially caught because of how I search for 
unknown bugs.  I use "Arbitrary" from QuickCheck to generate random patterns and 
strings to search, and compare regex-tdfa to another POSIX engine.

   Because I am on OS X, I am limited by the the native POSIX libraries bugs: 
this bug in regex-tdfa was triggered only when the native POSIX was also buggy.

   But the source of most of my unit tests is AT&T research [1], and they have a 
"libast" with a POSIX implementation.  I have adapted my regex-* wrapper 
packages to make a "regex-ast" Haskell interface, but the difficulties with the 
AT&T headers prevent me from releasing this on hackage.  This "regex-ast" has 
given me access to a less buggy POSIX back-end, and randomized testing has led 
to catching the bug fixed here (as well as a few bug reports back to AT&T).

   So while regex-tdfa will not win many speed contests, it is the only POSIX 
regular expression library I have running that passes all the unit tests.

[1] http://www.research.att.com/sw/download/
     http://www.research.att.com/~gsf/testregex/
     http://www.research.att.com/~gsf/testregex/re-interpretation.html


More information about the Libraries mailing list