behaviour change in getDirectoryContents in GHC 7.2?
igloo at earth.li
Wed Nov 2 21:16:41 CET 2011
On Wed, Nov 02, 2011 at 07:59:21PM +0000, Max Bolingbroke wrote:
> On 2 November 2011 19:13, Ian Lynagh <igloo at earth.li> wrote:
> > They are allowed to occur in Linux/ext2 filenames, anyway, and I think
> > we ought to be able to handle them correctly if they do.
> In Python, if a filename is decoded using UTF8 and the "surrogate
> escape" error handler, occurrences of lone surrogates are a decoding
> error because they are not allowed to occur in UTF-8 text. As a result
> the lone surrogate is put into the string escaped so it can be
> roundtripped back to a lone surrogate on output. So Python works OK.
> In GHC >= 7.2, if a filename is decoded using UTF8 and the "Roundtrip"
> error handler, occurrences of 0xEFNN are not a decoding error because
> they are perfectly fine Unicode codepoints. As a result they get put
> into the string unescaped, and so when we try to roundtrip the string
> we get the byte 0xNN in the output rather than the UTF-8 encoding of
> 0xEFNN. So GHC does not work OK in this situation :-(
Are you saying there's a bug that should be fixed?
More information about the Glasgow-haskell-users