ANN: TextRegexLazy 0.44
Chris Kuklewicz
haskell at list.mightyreason.com
Fri Jul 14 12:05:50 EDT 2006
Bulat Ziganshin wrote:
> Hello Chris,
>
> Thursday, July 13, 2006, 12:17:30 PM, you wrote:
>
>>>> Question 2: Is there interest in getting this into an official release of the
>>>> base libraries? The Compat module could at least replace or sit alongside the
>>>> performance sink of the current Text.Regex code.
>>> i'm 120% want to see ByteString, regular expressions matching for
>>> String and ByteString, and JRegex (=~ operator implementation) to be
>>> included in GHC 6.6
>
>> That typeclass interface is very handy, BUT it expects the thing being matched
>> against is a list of something. This prevents making ByteString an instance of
>> RegexLike.
>
>> The answer will be to alter the type class to not make such an assumption.
>> Luckily John Meacham put JRegex under the 3 clause BSD, so I will
>> * Make a modified version of the type classes
>> * Make Text.Regex.Lazy an instance of these type classes
>> * Port JRegex to be instances of these type classes (links to PCRE!)
>> Then I or someone else can
>> * Implement an efficient instance of Bytestring being handled by PCRE.
>
> regexps support for ByteStrings already exists:
>
> ========================================================================
>> btw, what will be really useful now, imho, is the interface to
>> Text.Regex. how about working on it as next stage?
>
> This is already done actually, here:
> http://www.cse.unsw.edu.au/~dons/code/lambdabot/Lib/Regex.hsc
> http://www.cse.unsw.edu.au/~dons/code/hmp3/Regex.hsc
> ========================================================================
Thanks, I'll go take a look at that. I have pcre + JRegex installed now. And I
have a remote darcs repository with my current version imported. (URL coming
after I am sure it won't get re-organized).
>
> well, i'm just dumb user telling what i want to see in GHC 6.6:
>
> * regexp matching for Strings and ByteStrings
> * perl-like syntax for doing it
> * ability to select regexp engine for each matching operation and
> using of most efficient ones (Lazy for String, posix or pcre (?) for
> ByteString) by default
>
> i also know that Simon Marlow want to see JRegex(-like) engine
> included in 6.6 (see http://hackage.haskell.org/trac/ghc/ticket/710 )
>
> what you mentioned is just implementation details for me, the dumb user :)
As a user, the JRegex API can also only support a single Regex type and a single
backend. But it would be really handy to be able to use different types of
regular expressions. Mainly there are going to be different regex syntax
possibilities:
* Old Text.Regex syntax, also emulated by Text.Regex.Lazy.Compat
* The "Full" syntax of Text.Regex.Lazy (close to Extended regex)
* regex.h syntax (perhaps Basic as well as Extended)
* pcre.h syntax
All of these might conceivably come in [Word8] and [Char] sources.
The backend will vary: at least because we will want both a Lazy version and a
hand-off to pcre library version (if installed) or regex library (more likely to
be installed).
And the plan is to generalize the target to be either [Char] or ByteString.
New Question: What do people think is the best way to use data/newtype/class to
allow for
1) Different regex syntax as different types
2) Different target [Char] or ByteString
3) Different engine in the back end.
My first thought is that the type of the regex encodes both which syntax is in
use and which back-end will be used. Something like
"Hello" =~ (pcre "el+")
would use PCRE syntax and pcre library backend against the [Char]. And
(pack "Hello") =~ (compatRE "el+")
Would use the old Text.Regex syntax and my lazy backend against the ByteString
produced by pack.
Other answers?
--
Chris
More information about the Libraries
mailing list