[Haskell] System.FilePath survey

Ben Rudiak-Gould Benjamin.Rudiak-Gould at cl.cam.ac.uk
Wed Feb 8 16:10:37 EST 2006


John Meacham wrote:
> On Tue, Feb 07, 2006 at 04:25:35PM +0000, Ben Rudiak-Gould wrote:
>>                  Posix       NT             Win9x
>>
>> pathnames        bytes       UTF-16         locale
>> command line     bytes       UTF-16         locale
>> file contents    bytes       bytes          bytes
>> pipes/sockets    bytes       bytes          bytes
> 
> actually, Posix systems should be the following
> 
>> pathnames        locale       UTF-16         locale
>> command line     locale       UTF-16         locale
>> file contents    *            bytes          bytes
>> pipes/sockets    *            bytes          bytes
> 
> Although the Posix interface is in terms of bytes, the strings should
> always be interpreted via the locale specified in $LANG or $LC_CTYPE
> also, for file contents and pipes/sockets, if you are passing text, and
> in the absence of some overriding standard or protocol, you should be
> using the encoding specified in the locale too.

But that's an application-level convention; the kernel only knows about 
bytes. On Windows the encoding of pathnames and the command line is a 
requirement imposed by the kernel. I think assuming the locale encoding for 
the command line on Posix is a bad idea. Users are unlikely to pass a 
misencoded command line explicitly, but I want my-haskell-util `find .` to 
work even on a mounted volume that uses the wrong encoding. (And I also want 
your-haskell-util to work, even if you didn't write it with this situation 
in mind.)

-- Ben



More information about the Libraries mailing list