[Haskell-cafe] Re: JRegex on "large" input sizes

John Meacham john at repetae.net
Mon Jul 3 18:30:27 EDT 2006


On Sat, Jul 01, 2006 at 02:27:17PM +0100, David House wrote:
> Hi all. I need a decent regex library and JRegex seems the perfect
> choice: simple API, yet well-featured, as well as PCRE support. I want
> to use it on a simple project which involves input files a little
> larger than typical -- between 100KB and 500KB -- but still small
> enough so as to not present a problem.
> 
> However, and I'm fairly sure JRegex is at fault here, my program
> segfaults on an input of ~230KB. Has anyone used JRegex successfully
> in this way before? If so, what tactics did you use?

This is due to the fact it tries to build the whole string in memory
before running the regex on it. Originally, this limitation was due to
the fact the API provided by the regex binding in fptools was
insufficient to allow chunking of data properly. now that JRegex uses
its own FFI binding, there is no reason this limitation should still
exist. I will happily accept any patches fixing this. There was some
talk of integrating JRegex into the main tree.

It should be noted that there are 2 somewhat independent things included
in JRegex, a better FFI binding to posix expressions as well as the PCRE
library. and the general framework for providing the =~ and =~~
operators. All that is needed is a smarter FFI binding, or even just
using another one that exists like Text.Regex.Lazy. it is
straightforward to add the instances needed to use the JRegex fancy
syntax.

        John

-- 
John Meacham - ⑆repetae.net⑆john⑈


More information about the Haskell-Cafe mailing list