Announcing regex-tre-0.66 and benchmarks
Chris Kuklewicz
haskell at list.mightyreason.com
Wed Aug 9 14:08:26 EDT 2006
Simon Marlow wrote:
> On 09 August 2006 15:14, Chris Kuklewicz wrote:
>
>> For 10^5 characters on String:
>> PCRE 0.077s
>> DFA 0.131s
>> TRE 0.206s
>> PosixRE 0.445s
>> Parsec 0.825s
>> Old Posix 43.760s (Text.Regex using splitRegex)
>>
>> Old Text.Regex took 43.76 seconds on 10^5 characters to do a task
>> comparable to the one the others did (it ran splitRegex). The new
>> PosixRE wrapping took 0.445 seconds instead. Yes it is two orders of
>> magnitude faster, and this is because my wrapping only marshals the
>> String to CString once. Laziness cannot be worth 2 orders of
>> magnitude of runtime. This is why we needed a new wrapping, which has
>> grown into the new library.
>
> Right, I see the problem with Text.Regex.splitRegex, it repeatedly packs
> the String into a CString. But then why this result:
>
>> BenchPCRE (102363,["bcdcd","cdc"],["bbccd","bcc"])
>> total is 1.294s
> .. etc. ...
>> BenchPosixRE (102363,["bcdcd","cdc"],["bbccd","bcc"])
>> total is 91.435s
>
> Was this the old Posix, or your new one? If the new one, why is it so
> slow compared to the others?
>
> Cheers,
> Simon
Your question has prompted me to go back into my PosixRE wrapping code and
compare it to the PCRE code. I have made some changes which ought to enhance
the performance of the PosixRE code. Let us see the new bechmarks on 10^6 bytes:
PosixRE
(102363,["bcdcd","cdc"],["bbccd","bcc"])
real 1m35.429s
user 1m17.862s
sys 0m1.455s
total is 79.317s
PCRE
(102363,["bcdcd","cdc"],["bbccd","bcc"])
real 0m2.570s
user 0m1.702s
sys 0m0.219s
total is 1.921s
BenchBSPosixRE (102363,["bcdcd","cdc"],["bbccd","bcc"])
real 1m32.267s
user 1m16.494s
sys 0m1.374s
total is 77.868s
BenchBSPCRE (102363,["bcdcd","cdc"],["bbccd","bcc"])
real 0m1.245s
user 0m0.809s
sys 0m0.110s
total is 0.919s
So there was only a little improvement to the previous PosixRE speed. If you
want to look at the code, it is in the three Wrap.hsc for regex-posix and
regex-tre and regex-pcre for the wrapMatchAll functions. But it appears to be a
library issue, not a Haskell issue.
I will tend to the Haddock cleanup next.
--
Chris
More information about the Libraries
mailing list