behaviour change in getDirectoryContents in GHC 7.2?

Max Bolingbroke batterseapower at hotmail.com
Wed Nov 9 11:55:50 CET 2011


On 7 November 2011 17:32, John Millikin <jmillikin at gmail.com> wrote:
> I am also not convinced that it is possible to correctly implement
> either of these functions if their behavior is dependent on the user's
> locale.

FWIW it's only dependent on the users locale because whether glibc
iconv detects errors in the *from* sequence depends on what the *to*
locale is. Clearly an invalid *from* sequence should be reported as
invalid regardless of *to*. I know this isn't much comfort to you,
though, since you do have to worry about broken behaviour in 7.2, and
possible future breakage with changes in iconv.

I understand your point that it would be better from a complexity
point of view to just roundtrip the bytes as *bytes* without relying
on all this escaping/unescaping code.

> Please understand, I am not arguing against the existence of this
> encoding layer in general. It's a fine idea for a simplistic
> high-level filesystem interaction library. But it should be
> *optional*, not part of the compiler or "base.

The problem is that I *really really want* getArgs to decode the
command line arguments. That's almost the whole point of this change,
and it is what most users seem to expect. Given this constraint, the
code has to be part of "base", and if getArgs has this behaviour then
any file system function we ship that takes a FilePath (i.e. all the
functions in base, directory, win32 and unix) must be prepared to
handle these escape characters for consistency.

I *would* be happy to expose an alternative file system API from the
posix package that operates with ByteString paths. This package could
provide a function :: FilePath -> ByteString that encodes the string
with the fileSystemEncoding (removing escapes in the process) for
interoperability with file names arriving via getArgs, and at that
point the decision about whether to use the escaping/unescaping code
would be (mostly) in the hands of the user. We could even have posix
expose APIs to get command line arguments/environment variables as
ByteStrings, and then you could avoid escape/unescape entirely.

Which of these solutions (if any) would satisfy you?
 1. The current situation, plus an alternative API exposed from
"posix" along the lines described above
 2. The current situation but with the escape/unescape modified so it
allows true roundtripping (at the cost of weird "surrogate" Char
values popping up now and again). If you have this you can reliably
implement the alternative API on top of the String based one, assuming
we got our escape/unescape code right

I hope we can work together to find a solution here.

Cheers,
Max



More information about the Glasgow-haskell-users mailing list