[Haskell-cafe] regex and Unicode

Francesco Ariis fa-ml at ariis.it
Thu Sep 8 02:46:10 UTC 2016


On Wed, Sep 07, 2016 at 09:21:43PM -0400, Brian Sammon wrote:
> I tried to write a program using Text.Regex.PCRE to search through a UTF8-encoded document.  It appears that the presence of non-breaking-space characters (code point 160) triggers some weird behavior in my program.
> 
> This is using the Debian stable(Jessie) packages of ghc 7.6.3 and libraries.
> 
> Now I find myself at a fork in the road, not sure which direction to head in.
> 
> Do I: 
> 1) Continue looking (or get help with looking) for bugs in my code?  (I
>     have this reduced to a pretty small test case)
> 2) Assemble a bug-report against debian?
> 3) Assemble a bug-report against Text.Regex.PCRE (or Text.Regex.Base) for
>     "upstream"
> 4) Uninstall Text.Regex.PCRE (and/or some other packages) and switch to
>     something that works with Unicode/UTF8?

I am pretty sure pcre-light has an utf8 mode. Is swapping the
two modules to check if bug persists feasible?


More information about the Haskell-Cafe mailing list