behaviour change in getDirectoryContents in GHC 7.2?

John Millikin jmillikin at gmail.com
Sun Nov 6 05:14:27 CET 2011


FYI: I just released new versions of system-filepath and
system-fileio, which attempt to work around the changes in GHC 7.2.

On Wed, Nov 2, 2011 at 11:55, Max Bolingbroke
<batterseapower at hotmail.com> wrote:
>> Maybe I'm misunderstanding, but it sounds like you're still trying to
>> treat posix file paths as text. There should not be any iconv or
>> locales or anything involved in looking up a posix file path.
>
> The thing is that on every non-Unix OS paths *can* be interpreted as
> text, and people expect them to be. In fact, even on Unix most
> programs/frameworks interpret them as text - e.g. IIRC QT's QString
> class is used for filenames in that framework, and if you view
> filenames in an end-user app like Nautilus it obviously decodes them
> in the current locale for presentation.

There is a difference between how paths are rendered to users, and how
they are handled by applications.

Applications *must* use whatever the operating system says a path is.
If a path is bytes, they must use bytes. If a path is text, they must
use text.

How they present paths to the user is a matter of user interface design.

For what it's worth, on my Ubuntu system, Nautilus ignores the locale
and just treats all paths as either UTF8 or invalid.
To me, this seems like the most reasonable option; the concept of
"locale encoding" is entirely vestigal, and should only be used in
certain specialized cases.

> Paths as text is just what people expect, and is grandfathered into
> the Haskell libraries itself as "type FilePath = String". PEP-383
> behaviour is (I think) a good way to satisfy this expectation while
> still not sacrificing the ability to deal with files that have names
> encoded in some way other than the locale encoding.

Paths as text is what *Windows* programmers expect. Paths as bytes is
what's expected by programmers on non-Windows OSes, including Linux
and OS X.

I'm not saying one is inherently better than the other, but
considering that various UNIX  and UNIX-like operating systems have
been using byte-based paths for near on forty years now, trying to
abolish them by redefining the type is not a useful action.

> (Perhaps if Haskell had an abstract FilePath data type rather than
> FilePath = String we could do something different.

This is the general purpose of my system-filepath package, which
provides a set of generic modifications, applicable to paths from
various OS families.

> But it's not clear
> if we could, without also having ugliness like getArgs :: IO [Byte])

We *ought* to have getArgs :: IO [ByteString], at least on POSIX systems.

It's totally OK if high-level packages like "directory" want to hide
details behind some nice abstractions. But the low-level libraries,
like "base" and "unix" and "Win32", must must must provide direct
low-level access to the operating system's APIs.

The only other option is to re-implement half of the standard library
using FFI bindings, which is ugly (for file/directory manipulation) or
nearly impossible (for opening handles).

If you're going to make all the System.IO stuff use text, at least
give us an escape hatch. The "unix" package is ideally suited, as it's
already inherently OS-specific. Something like this would be perfect:

------------------
System.Posix.File.openHandle :: CString -> IOMode -> IO Handle

System.Posix.File.rename :: CString -> CString -> IO ()
------------------



More information about the Glasgow-haskell-users mailing list