6.6 plans and status

Tue Aug 8 08:16:28 EDT 2006

Simon Marlow wrote:
> Chris Kuklewicz wrote:
> 
>> That could work well.  It would not involved too much pulling apart.
>>
>> Once small quirk is there is the old Text.Regex API and a new 
>> JRegex-style API.
> 
> Is it possible to provide both?  Perhaps deprecating the current API?

It is possible to provide the old and new.  The old was only defined for the 
String type and this probably will not be changed (at least at first).

>> A "default" backend has to be dependably present. That means either 
>> keeping the current Posix backend, adding a dependency on PCRE, or 
>> using the Haskell/Parsec backend.
> 
> I'm not keen on adding a PCRE dependency.  We already include an 
> implementation of POSIX regexes in GHC itself 
> (libraries/base/cbits/regex) which tends to get used on Windows where 
> there isn't an implementation of POSIX regexes

Ah.  That is how you are doing it.

>> The problem is that String is very inefficient with Posix or PCRE and 
>> ByteString is slightly inefficient with Haskell/Parsec.
> 
> Do you have any measurements (rough measurements would be fine)?  When 
> you say "very inefficient", by what factor is the Parsec implementation 
> faster than using the Posix one for Strings?

This whole Text.Regex.Lazy project was born from the computer language shootout. 
, http://haskell.org/hawiki/RegexDna .  The Text.Regex(.Posix) that came with 
GHC timed out (hours!).  The pure haskell/parsec version took about 2 minutes. 
That is the meaning "very inefficient" for repeated use of Text.Regex(.Posix) on 
String: more than two orders of magnitude, since it is not caching the CString 
that it marshals.

> 
> If we were to use the Parsec implementation, that pulls in another 
> dependency. Not out of the question, but to be avoided if possible.

The only nonparsec/nonlibrary version is a simple DFA which is too simple for 
many uses.  To get what people expect from regular expressions you need posix 
library, pcre library, my parsec parser, or find someone else's regex 
implementation in haskell.  Or the parsec version could eventually be rewritten 
to not depend on parsec by implementing its own parser monad.

To keep a Posix default backend the libraries/base/cbits/regex may need to 
become part of regex-posix.  That would be a learning curve for me as I have no 
ghc on windows experience, though I have a computer for it next to me.  So I 
might need help later for that.

>> So we could either:
>>>
>>>   - work on regex-base/regex-posix for inclusion in GHC, or
>>
>> I could prepare this for you.
> 
> Great, thanks!

The re-organization is in progress (hooray for "darcs mv").
After re-organization will come the doc/Haddock clean up to match.
After that comes the unit testing clean up (I have some HUnit and QuickCheck now).
Then, time permitting, benchmarks.

>> I'll assemble a version organized like that this week.  Important 
>> question:
>> Should I be planning to install alongside the current 
>> Text.Regex(.Posix) or planning on replacing them? (With an identical 
>> API)?
> 
> We want to replace Text.Regex.  So ideally you want to do this in a GHC 
> tree, so you can remove the old Text.Regex and replace with yours.  If 
> this is too difficult, then you could develop it separately (as 
> Text.Regex.New, or something), and I'll make the relevant changes when I 
> import it.

I will make such a Text.Regex.New that fakes the old API.  I'll make it use the 
posix backend, but that can be changed via an import statement.

I suggest removing the old Text.Regex.Posix module.  People will be able to make 
better use of the new API for doing this.

-- 
Chris