ANN: New release of regex packages
Chris Kuklewicz
haskell at list.mightyreason.com
Mon Mar 5 10:51:47 EST 2007
I would like to announce new versions of the regex-* packages. This
announcement covers:
Version Package Description
0.83 regex-base -- Type Classes and generic instances
0.90 regex-compat -- Uses regex-posix to provide old API
0.91 regex-dfa -- backend, pure haskell, no submatch capture
0.90 regex-parsec -- backend, pure haskell
0.91 regex-pcre -- backend, links against libpcre
0.91 regex-posix -- backend, links against standard c library
0.92 regex-tdfa -- backend, pure haskell (Posix semantics)
0.91 regex-tre -- backend, links against libtre (currently buggy)
These all compile, install, and run a few test correctly. Most
notably, I consider regex-tdfa to be of useful quality now.
Summary of changes and recommendations:
* all packages:
** import Text.Regex.XXX exposes (getVersion_Text_Regex_XXX :: Data.Version)
which allows programs to access the current version number of the package
** LICENSE file provided (all are 3 clause BSD except regex-dfa is LGPL)
* regex-base:
** BUGFIX: one of the RegexContext instances used tail unsafely
** RegexMaker now has makeRegexM and makeRegexOptsM for better error handling
** Extract has new instances for (Seq Char) and (ByteString.Lazy),
as well as the previous [Char] and ByteString instances
* all backends:
** Now support [Char], (Seq Char), ByteString, and ByteString.Lazy
** CHANGE: The (=~~) monadic match operators now use makeRegexM and will call
'fail' when a regular expression cannot be parsed.
** CHANGE: (import Text.Regex.BACKEND) now re-exports (module Text.Regex.Base)
* regex-dfa:
** BUGFIX: No longer hangs on repeated nullable subpatterns
* regex-tdfa:
** New backend in pure haskell that provides true Posix semantics
** Runs with excellent memory usage
** I recommend this backend for Posix extended regular expressions (leftmost
longest).
* regex-compat: No other changes, still uses regex-posix underneath, not
recommended
* regex-parsec: No other changes, I recommend regex-tdfa or regex-pcre instead
* regex-pcre: No other changes, best provider of Perl's left-biased regular
expressions
* regex-posix: No other changes, very slow (on OS X the underlying C library
is buggy)
* regex-tre: No other changes, underlying libtre version 0.7.5 is still buggy
Dependencies:
All of the above packages have been updated to depend on
regex-base>=0.80. I have only tested with GHC 6.6 on Mac OS X 10.4.8
(PPC, 32bit). Porting the backends to other Haskell compilers should
be possible, though they may not support the polymorphic type class
API that regex-base provides. Porting to GHC 6.4 should work once the
support for (Seq Char) and ByteString[.Lazy] has been edited or
externally obtained. I think only regex-tdfa actually uses bang
patterns at the moment, and those could also be removed when porting.
Where to get more information and the packages themselves:
There is a slowly developing wiki page at
http://haskell.org/haskellwiki/Regular_expressions
for holding more documentation relating to these packages.
I have uploaded tar.gz sources for each of the packages to hackage:
http://hackage.haskell.org/packages/hackage.html
They are listed under the "Text" Category:
http://hackage.haskell.org/packages/archive/pkg-list.html#cat:Text
Development and bug fixes continue in the darcs repositories under
http://darcs.haskell.org/packages/regex-unstable/
To checkout one of the above versions with darcs you can use commands like
darcs get --partial --tag=0.83 regex-base
where the --tag=0.83 may be omitted to get the latest unstable version
To install the packages once you have the source:
For regex-pcre and regex-tre (and perhaps regex-posix) you might need
to edit the end of cabal file to provide Include and Lib directories
to the corresponding C library.
# Compile Setup.hs for better startup speed
ghc --make Setup.hs -o setup
# I use my own path and "--user" .I recommend doing this to avoid overwriting
# the global regex-* from GHC 6.6
./setup configure --enable-library-profiling --prefix=YOUR_PATH --user
./setup build
./setup install
Producing haddock documentation may not work and may not be up to
date, with the important exception of regex-base.
Future Plans:
* regex-base: add support for generalized indices instead of the current Int
* regex-tdfa: Improve DFA algorithm and further limit memory allocation.
Try to improve performance of ByteString.Lazy matching.
Cheers,
Chris Kuklewicz
More information about the Libraries
mailing list