Announcing regex-tre-0.66 and benchmarks

Thu Aug 10 06:41:29 EDT 2006

Donald Bruce Stewart wrote:
> simonmarhaskell:
>> Chris Kuklewicz wrote:
>>
>>> Your question has prompted me to go back into my PosixRE wrapping code 
>>> and compare it to the PCRE code.  I have made some changes which ought 
>>> to enhance the performance of the PosixRE code.  Let us see the new 
>>> bechmarks on 10^6 bytes:
>>>
>>> PosixRE
>>> (102363,["bcdcd","cdc"],["bbccd","bcc"])
>>>
>>> real    1m35.429s
>>> user    1m17.862s
>>> sys     0m1.455s
>>>
>>> total is 79.317s
>>>
>>> PCRE
>>> (102363,["bcdcd","cdc"],["bbccd","bcc"])
>>>
>>> real    0m2.570s
>>> user    0m1.702s
>>> sys     0m0.219s
>>>
>>> total is 1.921s
>> So I still don't understand why PCRE should be 40 times faster than 
>> PosixRE. Surely this can't be just due to differences in the underlying C 
>> library?
> 
> It could be. The C regex.h is pretty slow.
> 
>     http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=all
> 
> -- Don

And I notice c++ (g++) gets away with a 3rd party library from boost:

> // This implementation of regexdna does not use the POSIX regex
> // included with the GNU libc. Instead it uses the Boost C++ libraries
> //
> // http://www.boost.org/libs/regex/doc/index.html
> //
> // (On Debian: apt-get install libboost-regex-dev before compiling,
> //  and then "g++ -O3 -lboost_regex regexdna.cc -o regexdna
> //  Gentoo seems to package boost as, well, 'boost')

Which is a strange precedent.

-- 
Chris