[Haskell-cafe] regex and Unicode
haskell-cafe at brisammon.fastmail.fm
Thu Sep 8 19:48:33 UTC 2016
On Wed, 7 Sep 2016 21:21:43 -0400
Brian Sammon <haskell-cafe at brisammon.fastmail.fm> wrote:
> I tried to write a program using Text.Regex.PCRE to search through a UTF8-> encoded document. It appears that the presence of non-breaking-space
> characters (code point 160) triggers some weird behavior in my program.
Not sure why I didn't find it with earlier google searches, but today I found this rather interesting thread from a few years back on haskell-cafe:
It describes a problem someone was having with GHC 7 and passing strings to Text.Regex.PCRE. There is also a suggested workaround and an explanation that seems to be a very good match for the off-by-one error I was seeing.
I can't tell (from that thread or elsewhere on google) if/when/how this bug was fixed, but based on other responses here, it sounds like it was fixed by the time of GHC 8.
More information about the Haskell-Cafe