[Haskell-cafe] regex and Unicode

Brian Sammon haskell-cafe at brisammon.fastmail.fm
Thu Sep 8 19:48:33 UTC 2016


On Wed, 7 Sep 2016 21:21:43 -0400
Brian Sammon <haskell-cafe at brisammon.fastmail.fm> wrote:

> I tried to write a program using Text.Regex.PCRE to search through a UTF8-> encoded document.  It appears that the presence of non-breaking-space 
> characters (code point 160) triggers some weird behavior in my program.

Not sure why I didn't find it with earlier google searches, but today I found this rather interesting thread from a few years back on haskell-cafe:
http://haskell-cafe.haskell.narkive.com/OU9UhI0y/

It describes a problem someone was having with GHC 7 and passing strings to Text.Regex.PCRE.  There is also a suggested workaround and an explanation that seems to be a very good match for the off-by-one error I was seeing.

I can't tell (from that thread or elsewhere on google) if/when/how this bug was fixed, but based on other responses here, it sounds like it was fixed by the time of GHC 8.


More information about the Haskell-Cafe mailing list