behaviour change in getDirectoryContents in GHC 7.2?

Ian Lynagh igloo at earth.li
Wed Nov 9 14:11:08 CET 2011


On Wed, Nov 09, 2011 at 11:02:54AM +0000, Simon Marlow wrote:
> 
> I would be happy with the surrogate approach I think.  Arguable if
> you try to treat a string with lone surrogates as Unicode and it
> fails, then that is a feature: the original string wasn't Unicode.
> All you can do with an invalid Unicode string is use it as a
> FilePath again, and the right thing will happen.

If we aren't going to guarantee that the encoded string is unicode, then
is there any benefit to encoding it in the first place?

> Alternatively if we stick with the private char approach, it should
> be possible to have an escaping scheme for 0xEFxx characters in the
> input that would enable us to roundtrip correctly.  That is, escape
> 0xEFxx into a sequence 0xYYEF 0xYYxx for some suitable YY.

Why not encode into private chars, i.e. encode U+EF00 (which in UTF8 is
0xEE 0xBC 0x80) as U+EFEE U+EFBC U+EF80, etc?

(Max gave some reasons earlier in this thread, but I'd need examples of
what goes wrong to understand them).


Thanks
Ian




More information about the Glasgow-haskell-users mailing list