behaviour change in getDirectoryContents in GHC 7.2?

Max Bolingbroke batterseapower at hotmail.com
Thu Nov 10 10:28:52 CET 2011


On 9 November 2011 16:29, Simon Marlow <marlowsd at gmail.com> wrote:
> Ok, so since we need something like
>
>  makePrintable :: FilePath -> String
>
> arguably we might as well make that do the locale decoding.  That's
> certainly a good point...

You could, but getArgs :: IO [String], not :: IO [FilePath]. And
locale-decoding command-line arguments is the Right Thing To Do. So
this doesn't really avoid the need to roundtrip, does it?

Is there any consensus about what to do here? My take is that we
should move back to lone surrogates. This:
  1. Recovers the roundtrip property, which we appear to believe is essential
  2. Removes all the weird problems I outlined earlier that can occur
if your byte strings happen to contain some bytes that decode to
U+EFxx
  3. DOES break software that expects Strings not to contain surrogate
codepoints, but (I agree with you) this is arguably a feature

This is also exactly what Python does so it has the advantage of being
battle tested.

Agreed?

We can additionally:
 * Provide your layer in the "unix" package where FilePath =
ByteString, for people who for some reason care about performance of
their FilePath encoding/decoding, OR who don't want to rely on the
roundtripping property being implemented correctly
 * Perhaps provide a layer in the "win32" package where FilePath =
ByteString but where that ByteString is guaranteed to be UTF-16
encoded (I'm less sure about this, because we can always unambiguously
decode this without doing any escaping. It's still useful if you care
about performance.)

I'm wondering if we should also have hSetLocaleEncoding,
hSetFileSystemEncoding :: TextEncoding -> IO () and change
localeEncoding, fileSystemEncoding :: IO TextEncoding.
hSetFileSystemEncoding in particular would let people opt-out of
escapes entirely as long as they issued it right at the start of their
program before the fileSystemEncoding had been used.

What do you think?

Max



More information about the Glasgow-haskell-users mailing list