[Haskell-cafe] strange behavior in Text.Regex.Posix

John MacFarlane jgm at berkeley.edu
Mon Jan 22 13:22:15 EST 2007


Can anyone help me understand this odd behavior in Text.Regex.Posix (GHC 6.6)?

Prelude Text.Regex.Posix Text.Regex> subRegex (mkRegex "\\^") "he\350llo" "@"
"he at llo"

Why does /\^/ match \350 here?  Generally Text.Regex.Posix seems to work
fine with unicode characters.  For example, \350 is treated as a single
character here:

Prelude Text.Regex.Posix Text.Regex> subRegex (mkRegex "e.l") "he\350llo" "@"
"h at lo"

The problem is specific to \350 and doesn't happen with, say, \351:

Prelude Text.Regex> subRegex (mkRegex "\\^") "he\351llo" "@"
"he\351llo"

Is this a bug, or just something I'm not understanding?

John



More information about the Haskell-Cafe mailing list