behaviour change in getDirectoryContents in GHC 7.2?

Simon Marlow marlowsd at
Thu Nov 10 15:35:32 CET 2011

On 10/11/2011 09:28, Max Bolingbroke wrote:

> Is there any consensus about what to do here? My take is that we
> should move back to lone surrogates. This:
>    1. Recovers the roundtrip property, which we appear to believe is essential
>    2. Removes all the weird problems I outlined earlier that can occur
> if your byte strings happen to contain some bytes that decode to
> U+EFxx
>    3. DOES break software that expects Strings not to contain surrogate
> codepoints, but (I agree with you) this is arguably a feature
> This is also exactly what Python does so it has the advantage of being
> battle tested.
> Agreed?


> We can additionally:
>   * Provide your layer in the "unix" package where FilePath =
> ByteString, for people who for some reason care about performance of
> their FilePath encoding/decoding, OR who don't want to rely on the
> roundtripping property being implemented correctly

I think I'll do this anyway.

>   * Perhaps provide a layer in the "win32" package where FilePath =
> ByteString but where that ByteString is guaranteed to be UTF-16
> encoded (I'm less sure about this, because we can always unambiguously
> decode this without doing any escaping. It's still useful if you care
> about performance.)
> I'm wondering if we should also have hSetLocaleEncoding,
> hSetFileSystemEncoding :: TextEncoding ->  IO () and change
> localeEncoding, fileSystemEncoding :: IO TextEncoding.
> hSetFileSystemEncoding in particular would let people opt-out of
> escapes entirely as long as they issued it right at the start of their
> program before the fileSystemEncoding had been used.

Ok by me.


More information about the Glasgow-haskell-users mailing list