Announcing regex-tre-0.66 and benchmarks

Chris Kuklewicz haskell at list.mightyreason.com
Wed Aug 9 14:08:26 EDT 2006


Simon Marlow wrote:
> On 09 August 2006 15:14, Chris Kuklewicz wrote:
> 
>> For 10^5 characters on String:
>> PCRE       0.077s
>> DFA        0.131s
>> TRE        0.206s
>> PosixRE    0.445s
>> Parsec     0.825s
>> Old Posix 43.760s  (Text.Regex using splitRegex)
>>
>> Old Text.Regex took 43.76 seconds on 10^5 characters to do a task
>> comparable to the one the others did (it ran splitRegex).  The new
>> PosixRE wrapping took 0.445 seconds instead.  Yes it is two orders of
>> magnitude faster, and this is because my wrapping only marshals the
>> String to CString once.  Laziness cannot be worth 2 orders of
>> magnitude of runtime.  This is why we needed a new wrapping, which has
>> grown into the new library.
> 
> Right, I see the problem with Text.Regex.splitRegex, it repeatedly packs
> the String into a CString.  But then why this result:
> 
>> BenchPCRE (102363,["bcdcd","cdc"],["bbccd","bcc"])
>> total is 1.294s
> .. etc. ... 
>> BenchPosixRE (102363,["bcdcd","cdc"],["bbccd","bcc"])
>> total is 91.435s
> 
> Was this the old Posix, or your new one?  If the new one, why is it so
> slow compared to the others?
> 
> Cheers,
> 	Simon

Your question has prompted me to go back into my PosixRE wrapping code and 
compare it to the PCRE code.  I have made some changes which ought to enhance 
the performance of the PosixRE code.  Let us see the new bechmarks on 10^6 bytes:

PosixRE
(102363,["bcdcd","cdc"],["bbccd","bcc"])

real    1m35.429s
user    1m17.862s
sys     0m1.455s

total is 79.317s

PCRE
(102363,["bcdcd","cdc"],["bbccd","bcc"])

real    0m2.570s
user    0m1.702s
sys     0m0.219s

total is 1.921s

BenchBSPosixRE (102363,["bcdcd","cdc"],["bbccd","bcc"])

real    1m32.267s
user    1m16.494s
sys     0m1.374s

total is 77.868s

BenchBSPCRE (102363,["bcdcd","cdc"],["bbccd","bcc"])

real    0m1.245s
user    0m0.809s
sys     0m0.110s

total is 0.919s

So there was only a little improvement to the previous PosixRE speed.  If you 
want to look at the code, it is in the three Wrap.hsc for regex-posix and 
regex-tre and regex-pcre for the wrapMatchAll functions.  But it appears to be a 
library issue, not a Haskell issue.

I will tend to the Haddock cleanup next.

-- 
Chris


More information about the Glasgow-haskell-users mailing list