ANN: New release of regex packages

Chris Kuklewicz haskell at list.mightyreason.com
Mon Mar 5 10:51:47 EST 2007


I would like to announce new versions of the regex-* packages.  This
announcement covers:

Version Package     Description
  0.83  regex-base    -- Type Classes and generic instances
  0.90  regex-compat  -- Uses regex-posix to provide old API
  0.91  regex-dfa     -- backend, pure haskell, no submatch capture
  0.90  regex-parsec  -- backend, pure haskell
  0.91  regex-pcre    -- backend, links against libpcre
  0.91  regex-posix   -- backend, links against standard c library
  0.92  regex-tdfa    -- backend, pure haskell (Posix semantics)
  0.91  regex-tre     -- backend, links against libtre (currently buggy)

These all compile, install, and run a few test correctly.  Most
notably, I consider regex-tdfa to be of useful quality now.

Summary of changes and recommendations:
   * all packages:
   ** import Text.Regex.XXX exposes (getVersion_Text_Regex_XXX :: Data.Version)
      which allows programs to access the current version number of the package
   ** LICENSE file provided (all are 3 clause BSD except regex-dfa is LGPL)

   * regex-base:
   ** BUGFIX: one of the RegexContext instances used tail unsafely
   ** RegexMaker now has makeRegexM and makeRegexOptsM for better error handling
   ** Extract has new instances for (Seq Char) and (ByteString.Lazy),
        as well as the previous [Char] and ByteString instances

   * all backends:
   ** Now support [Char], (Seq Char), ByteString, and ByteString.Lazy
   ** CHANGE: The (=~~) monadic match operators now use makeRegexM and will call 
'fail' when a regular expression cannot be parsed.
   ** CHANGE: (import Text.Regex.BACKEND) now re-exports (module Text.Regex.Base)

   * regex-dfa:
   ** BUGFIX: No longer hangs on repeated nullable subpatterns

   * regex-tdfa:
   ** New backend in pure haskell that provides true Posix semantics
   ** Runs with excellent memory usage
   ** I recommend this backend for Posix extended regular expressions (leftmost 
longest).

   * regex-compat: No other changes, still uses regex-posix underneath, not 
recommended
   * regex-parsec: No other changes, I recommend regex-tdfa or regex-pcre instead
   * regex-pcre:   No other changes, best provider of Perl's left-biased regular 
expressions
   * regex-posix:  No other changes, very slow (on OS X the underlying C library 
is buggy)
   * regex-tre:    No other changes, underlying libtre version 0.7.5 is still buggy

Dependencies:

All of the above packages have been updated to depend on
regex-base>=0.80. I have only tested with GHC 6.6 on Mac OS X 10.4.8
(PPC, 32bit).  Porting the backends to other Haskell compilers should
be possible, though they may not support the polymorphic type class
API that regex-base provides.  Porting to GHC 6.4 should work once the
support for (Seq Char) and ByteString[.Lazy] has been edited or
externally obtained.  I think only regex-tdfa actually uses bang
patterns at the moment, and those could also be removed when porting.

Where to get more information and the packages themselves:

There is a slowly developing wiki page at
http://haskell.org/haskellwiki/Regular_expressions
for holding more documentation relating to these packages.

I have uploaded tar.gz sources for each of the packages to hackage:
http://hackage.haskell.org/packages/hackage.html
They are listed under the "Text" Category:
http://hackage.haskell.org/packages/archive/pkg-list.html#cat:Text

Development and bug fixes continue in the darcs repositories under
http://darcs.haskell.org/packages/regex-unstable/
To checkout one of the above versions with darcs you can use commands like
darcs get --partial --tag=0.83 regex-base
where the --tag=0.83 may be omitted to get the latest unstable version

To install the packages once you have the source:

For regex-pcre and regex-tre (and perhaps regex-posix) you might need
to edit the end of cabal file to provide Include and Lib directories
to the corresponding C library.

# Compile Setup.hs for better startup speed
ghc --make Setup.hs -o setup

# I use my own path and "--user" .I recommend doing this to avoid overwriting
# the global regex-* from GHC 6.6
./setup configure --enable-library-profiling --prefix=YOUR_PATH --user

./setup build

./setup install

Producing haddock documentation may not work and may not be up to
date, with the important exception of regex-base.

Future Plans:
   * regex-base: add support for generalized indices instead of the current Int
   * regex-tdfa: Improve DFA algorithm and further limit memory allocation.
                 Try to improve performance of ByteString.Lazy matching.

Cheers,
   Chris Kuklewicz


More information about the Libraries mailing list