[Haskell-cafe] Re: Unicode workaround for getDirectoryContents under Windows?

Simon Marlow marlowsd at gmail.com
Wed Jun 17 08:46:55 EDT 2009

On 17/06/2009 13:21, Yitzchak Gale wrote:
> I wrote:
>>> I think the most important use cases that should not break are:
>>> o open/read/write a FilePath from getArgs
>>> o open/read/write a FilePath from getDirectoryContents
> Simon Marlow wrote:
>> The following cases are currently broken:
>>   * Calling openFile on a literal Unicode FilePath (note, not
>>    ACP-encoded, just Unicode).
>>   * Reading a Unicode FilePath from a text file and then calling
>>    openFile on it
>> I propose to fix these (on Windows).  It will mean that your second case
>> above will be broken, until someone fixes getDirectoryContents.
> Why only on Windows?

Just because it's a lot easier on Windows - all the OS APIs take Unicode 
file paths, so it's obvious what to do.  In contrast on Unix I don't 
have a clear idea of how to proceed.

On Unix, all file APIs take [Word8] rather than [Char].  By convention, 
the [Word8] is usually assumed to be a string in the locale encoding, 
but that's only a user-space convention.

So we should probably be converting from FilePath to [Word8] by encoding 
using the current locale.  This raises various complications (what about 
encoding errors, and what if encode.decode is not the identity due to 
normalisation, etc.).

But you don't have to wait for me to fix this stuff (I'm feeling a bit 
Unicoded-out right now :)  If someone else has a good understanding of 
what needs done, please wade in.

>> I don't know how getArgs fits in here - should we be decoding argv using the
>> ACP?
> And why not also on Unix? On any platform, the expected behavior should
> be that you type a file path at the command line, read it using getArgs,
> and open the file using that.

Right.  On Unix it works at the moment because we neither decode argv 
nor encode FilePaths, so the bytes get passed through unchanged.  Same 
with getDirectoryContents.

But I agree it's broken and needs to be fixed.


More information about the Haskell-Cafe mailing list