ANN: TextRegexLazy 0.44

Chris Kuklewicz haskell at list.mightyreason.com
Fri Jul 14 12:05:50 EDT 2006


Bulat Ziganshin wrote:
> Hello Chris,
> 
> Thursday, July 13, 2006, 12:17:30 PM, you wrote:
> 
>>>> Question 2: Is there interest in getting this into an official release of the
>>>> base libraries?  The Compat module could at least replace or sit alongside the
>>>> performance sink of the current Text.Regex code.
>>> i'm 120% want to see ByteString, regular expressions matching for
>>> String and ByteString, and JRegex (=~ operator implementation) to be
>>> included in GHC 6.6
> 
>> That typeclass interface is very handy, BUT it expects the thing being matched
>> against is a list of something.  This prevents making ByteString an instance of
>> RegexLike.
> 
>> The answer will be to alter the type class to not make such an assumption.
>> Luckily John Meacham put JRegex under the 3 clause BSD, so I will
>>    * Make a modified version of the type classes
>>    * Make Text.Regex.Lazy an instance of these type classes
>>    * Port JRegex to be instances of these type classes (links to PCRE!)
>> Then I or someone else can
>>    * Implement an efficient instance of Bytestring being handled by PCRE.
> 
> regexps support for ByteStrings already exists:
> 
> ========================================================================
>> btw, what will be really useful now, imho, is the interface to
>> Text.Regex. how about working on it as next stage?
> 
> This is already done actually, here:
>     http://www.cse.unsw.edu.au/~dons/code/lambdabot/Lib/Regex.hsc
>     http://www.cse.unsw.edu.au/~dons/code/hmp3/Regex.hsc
> ========================================================================

Thanks, I'll go take a look at that.  I have pcre + JRegex installed now. And I 
have a remote darcs repository with my current version imported. (URL coming 
after I am sure it won't get re-organized).

> 
> well, i'm just dumb user telling what i want to see in GHC 6.6:
> 
> * regexp matching for Strings and ByteStrings
> * perl-like syntax for doing it
> * ability to select regexp engine for each matching operation and
> using of most efficient ones (Lazy for String, posix or pcre (?) for
> ByteString) by default
> 
> i also know that Simon Marlow want to see JRegex(-like) engine
> included in 6.6 (see http://hackage.haskell.org/trac/ghc/ticket/710 )
> 
> what you mentioned is just implementation details for me, the dumb user :)

As a user, the JRegex API can also only support a single Regex type and a single 
backend.  But it would be really handy to be able to use different types of 
regular expressions.  Mainly there are going to be different regex syntax 
possibilities:

   * Old Text.Regex syntax, also emulated by Text.Regex.Lazy.Compat
   * The "Full" syntax of Text.Regex.Lazy (close to Extended regex)
   * regex.h syntax (perhaps Basic as well as Extended)
   * pcre.h syntax

All of these might conceivably come in [Word8] and [Char] sources.

The backend will vary: at least because we will want both a Lazy version and a 
hand-off to pcre library version (if installed) or regex library (more likely to 
be installed).

And the plan is to generalize the target to be either [Char] or ByteString.

New Question: What do people think is the best way to use data/newtype/class to 
allow for
   1) Different regex syntax as different types
   2) Different target [Char] or ByteString
   3) Different engine in the back end.

My first thought is that the type of the regex encodes both which syntax is in 
use and which back-end will be used.  Something like

  "Hello" =~ (pcre "el+")

would use PCRE syntax and pcre library backend against the [Char]. And

  (pack "Hello") =~ (compatRE "el+")

Would use the old Text.Regex syntax and my lazy backend against the ByteString 
produced by pack.

Other answers?

-- 
Chris


More information about the Libraries mailing list