[Haskell-cafe] ANN: TextRegexLazy-0.56, (=~) and (=~~) are here
Chris Kuklewicz
haskell at list.mightyreason.com
Wed Aug 2 07:16:58 EDT 2006
Announcing: TextRegexLazy version 0.56
Where: Tarball from http://sourceforge.net/projects/lazy-regex
darcs get --partial [--tag=0.56] http://evenmere.org/~chrisk/trl/stable/
License : BSD, except for DFAEngine.hs which is LGPL (derived from CTK light)
Development/unstable version is at:
darcs get [--partial] http://evenmere.org/~chrisk/trl/devel/
This is the version that has eaten John Meacham's JRegex library and survived to
become strong. Thanks John!
It now compiles against the posix regexp provided by the c library and the pcre
library, in addition to the "full lazy" and the "DFA" backends.
All 4 backends can accept regular expressions given as String and as ByteString.
All 4 backends can run regular expressions against String and ByteString.
In particular, the PosixRE and PCRE can run very efficiently against ByteString.
(Though the input for the PosixRE needs to end in a \NUL character for efficiency).
So there are 4*2*2 = 16 ways to use to provide input to this library. And the
RegexContext class has at least 11 instances that both (=~) and (=~~) can
target. So that is 4*2*2*11*2 = 352 things you can do with this library! Get
your copy today!
To run with cabal before 1.1.4 you will need to comment out the
"Extra-Source-Files:" line in the TextRegexLazy.cabal file.
The Example.hs file:
> {-# OPTIONS_GHC -fglasgow-exts #-}
> import Text.Regex.Lazy
> import Text.Regex.Full((=~),(=~~)) -- or DFA or PCRE or PosixRE
>
> main = let b :: Bool
> b = ("abaca" =~ "(.)a")
> c :: [MatchArray]
> c = ("abaca" =~ "(.)a")
> d :: Maybe (String,String,String,[String])
> d = ("abaca" =~~ "(.)a")
> in do print b
> print c
> print d
This produces:
> True
> [array (0,1) [(0,(1,2)),(1,(1,1))],array (0,1) [(0,(3,2)),(1,(3,1))]]
> Just ("a","ba","ca",["b"])
You can also use makeRegex and makeRegexOpts to compile and save a regular
expression which will be used multiple times. Each of the 4 backends has a
separate "Regex" data type with its own option types.
For low level access, the WrapPCRE and WrapPosix modules expose a typesafe layer
around the c libraries. You can query the "getVersion :: Maybe String" to see
if the have been compiled into the library.
It may be possible to use WrapPCRE and the UTF8 option flags to do unicode regex
matching with PCRE. ( The Full and DFA backends use the Haskell unicode Char
already ).
Adding new types to String/ByteString is a matter of adding instances to the
existing classes.
Feedback and comments of any length is welcome.
--
Chris Kuklewicz
More information about the Haskell-Cafe
mailing list