behaviour change in getDirectoryContents in GHC 7.2?
Simon Marlow
marlowsd at gmail.com
Thu Nov 10 15:35:32 CET 2011
On 10/11/2011 09:28, Max Bolingbroke wrote:
> Is there any consensus about what to do here? My take is that we
> should move back to lone surrogates. This:
> 1. Recovers the roundtrip property, which we appear to believe is essential
> 2. Removes all the weird problems I outlined earlier that can occur
> if your byte strings happen to contain some bytes that decode to
> U+EFxx
> 3. DOES break software that expects Strings not to contain surrogate
> codepoints, but (I agree with you) this is arguably a feature
>
> This is also exactly what Python does so it has the advantage of being
> battle tested.
>
> Agreed?
Agreed.
> We can additionally:
> * Provide your layer in the "unix" package where FilePath =
> ByteString, for people who for some reason care about performance of
> their FilePath encoding/decoding, OR who don't want to rely on the
> roundtripping property being implemented correctly
I think I'll do this anyway.
> * Perhaps provide a layer in the "win32" package where FilePath =
> ByteString but where that ByteString is guaranteed to be UTF-16
> encoded (I'm less sure about this, because we can always unambiguously
> decode this without doing any escaping. It's still useful if you care
> about performance.)
>
> I'm wondering if we should also have hSetLocaleEncoding,
> hSetFileSystemEncoding :: TextEncoding -> IO () and change
> localeEncoding, fileSystemEncoding :: IO TextEncoding.
> hSetFileSystemEncoding in particular would let people opt-out of
> escapes entirely as long as they issued it right at the start of their
> program before the fileSystemEncoding had been used.
Ok by me.
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list