[Haskell-cafe] ANN: TextRegexLazy-0.56, (=~) and (=~~) are here

Chris Kuklewicz haskell at list.mightyreason.com
Wed Aug 2 07:16:58 EDT 2006


Announcing: TextRegexLazy version 0.56
Where: Tarball from http://sourceforge.net/projects/lazy-regex
        darcs get --partial [--tag=0.56] http://evenmere.org/~chrisk/trl/stable/
License : BSD, except for DFAEngine.hs which is LGPL (derived from CTK light)

Development/unstable version is at:
        darcs get [--partial] http://evenmere.org/~chrisk/trl/devel/

This is the version that has eaten John Meacham's JRegex library and survived to 
become strong.  Thanks John!

It now compiles against the posix regexp provided by the c library and the pcre 
library, in addition to the "full lazy" and the "DFA" backends.

All 4 backends can accept regular expressions given as String and as ByteString.

All 4 backends can run regular expressions against String and ByteString.

In particular, the PosixRE and PCRE can run very efficiently against ByteString. 
(Though the input for the PosixRE needs to end in a \NUL character for efficiency).

So there are 4*2*2 = 16 ways to use to provide input to this library.  And the 
RegexContext class has at least 11 instances that both (=~) and (=~~) can 
target.  So that is 4*2*2*11*2 = 352 things you can do with this library!  Get 
your copy today!

To run with cabal before 1.1.4 you will need to comment out the 
"Extra-Source-Files:" line in the TextRegexLazy.cabal file.

The Example.hs file:

> {-# OPTIONS_GHC -fglasgow-exts #-}
> import Text.Regex.Lazy
> import Text.Regex.Full((=~),(=~~)) -- or DFA or PCRE or PosixRE
> 
> main = let b :: Bool
>            b = ("abaca" =~ "(.)a")
>            c :: [MatchArray]
>            c = ("abaca" =~ "(.)a")
>            d :: Maybe (String,String,String,[String])
>            d = ("abaca" =~~ "(.)a")
>        in do print b
>              print c
>              print d

This produces:

> True
> [array (0,1) [(0,(1,2)),(1,(1,1))],array (0,1) [(0,(3,2)),(1,(3,1))]]
> Just ("a","ba","ca",["b"])

You can also use makeRegex and makeRegexOpts to compile and save a regular 
expression which will be used multiple times.  Each of the 4 backends has a 
separate "Regex" data type with its own option types.

For low level access, the WrapPCRE and WrapPosix modules expose a typesafe layer 
around the c libraries.  You can query the "getVersion :: Maybe String" to see 
if the have been compiled into the library.

It may be possible to use WrapPCRE and the UTF8 option flags to do unicode regex 
matching with PCRE. ( The Full and DFA backends use the Haskell unicode Char 
already ).

Adding new types to String/ByteString is a matter of adding instances to the 
existing classes.

Feedback and comments of any length is welcome.

-- 
Chris Kuklewicz


More information about the Haskell-Cafe mailing list