[Haskell-cafe] JRegex on "large" input sizes
Chris Kuklewicz
haskell at list.mightyreason.com
Sat Jul 1 10:58:56 EDT 2006
David House wrote:
> Hi all. I need a decent regex library and JRegex seems the perfect
> choice: simple API, yet well-featured, as well as PCRE support.
I "maintain" Text.Regex.Lazy ( http://sourceforge.net/projects/lazy-regex ) so I
would mention it does not have full PCRE support. The module's documentation (
summarize here http://sourceforge.net/forum/forum.php?forum_id=554104 ) explains
what it does have. In summary of summary:
For simple Regex usage (with capture) the Text.Regex.Lazy.Compat module replaces
Text.Regex with a better implementation.
For simple expressions where a DFA works, the CompatDFA is fastest.
For fancier Regexes (such as using lazy pattern with ?? *? and +?) the
Text.Regex.Lazy.Full extends Text.Regex.Lazy.Compat.
For much fancier regular expressions (e.g. PCRE) you would need to add two
hopefully simple pieces:
(1) Extend the parsec code used to comprehend the meaning of the regex string.
(2) Extend the code that produces the Parsec parser that implements the desired
matching semantics.
(3) Test cases for the expanded syntax and semantics.
Note that Text.Regex.Lazy is an all Haskell solution. There are other haskell
projects that wrap the standard regex/pcre libraries. The problem is that
marshaling [Char] to c-strings is quite slow and cannot be lazy, so you may want
to use the new Fast Packed String (now ByteString) library with foreign
functions to call the pcre c-library.
> I want
> to use it on a simple project which involves input files a little
> larger than typical -- between 100KB and 500KB -- but still small
> enough so as to not present a problem.
>
> However, and I'm fairly sure JRegex is at fault here, my program
> segfaults on an input of ~230KB. Has anyone used JRegex successfully
> in this way before? If so, what tactics did you use?
>
> Thanks in advance.
>
More information about the Haskell-Cafe
mailing list