[Haskell-cafe] Re: Unicode workaround for getDirectoryContents
marlowsd at gmail.com
Wed Jun 17 08:46:55 EDT 2009
On 17/06/2009 13:21, Yitzchak Gale wrote:
> I wrote:
>>> I think the most important use cases that should not break are:
>>> o open/read/write a FilePath from getArgs
>>> o open/read/write a FilePath from getDirectoryContents
> Simon Marlow wrote:
>> The following cases are currently broken:
>> * Calling openFile on a literal Unicode FilePath (note, not
>> ACP-encoded, just Unicode).
>> * Reading a Unicode FilePath from a text file and then calling
>> openFile on it
>> I propose to fix these (on Windows). It will mean that your second case
>> above will be broken, until someone fixes getDirectoryContents.
> Why only on Windows?
Just because it's a lot easier on Windows - all the OS APIs take Unicode
file paths, so it's obvious what to do. In contrast on Unix I don't
have a clear idea of how to proceed.
On Unix, all file APIs take [Word8] rather than [Char]. By convention,
the [Word8] is usually assumed to be a string in the locale encoding,
but that's only a user-space convention.
So we should probably be converting from FilePath to [Word8] by encoding
using the current locale. This raises various complications (what about
encoding errors, and what if encode.decode is not the identity due to
But you don't have to wait for me to fix this stuff (I'm feeling a bit
Unicoded-out right now :) If someone else has a good understanding of
what needs done, please wade in.
>> I don't know how getArgs fits in here - should we be decoding argv using the
> And why not also on Unix? On any platform, the expected behavior should
> be that you type a file path at the command line, read it using getArgs,
> and open the file using that.
Right. On Unix it works at the moment because we neither decode argv
nor encode FilePaths, so the bytes get passed through unchanged. Same
But I agree it's broken and needs to be fixed.
More information about the Haskell-Cafe