[Haskell-cafe] Re: Unicode workaround for getDirectoryContents under Windows?

Simon Marlow marlowsd at gmail.com
Tue Jun 16 09:02:43 EDT 2009

On 16/06/2009 13:46, Yitzchak Gale wrote:
> Simon Marlow wrote:
>>>> Care to submit a patch to put this in System.Directory, or better still
>>>> put the relevant functionality in System.Win32 and use it in
>>>> System.Directory?
> Bulat Ziganshin wrote:
>>> now getDirectoryContents return ACP (ansi code page) names so openFile
>>> works for files 1) and 2).
>>> With such change getDirectoryContents will return correct unicode
>>> names, so openFile will work only with names in first group.
>>> The right way is to fix ALL string-related calls in System.IO,
>>> System.Posix.Internals, System.Environment.
>> You're right in that we really ought to fix everything.  However, I'm happy
>> to just fix some of these things, even if it introduces some inconsistencies
>> in the meantime.  We already have much of System.Directory working with
>> Unicode FilePaths, so there are already inconsistencies here.
> +1 for integrating Unicode file paths. Thanks, Bulat!

Excuse my ignorance, but... what Unicode file paths?

> I think the most important use cases that should not break are:
> o open/read/write a FilePath from getArgs
> o open/read/write a FilePath from getDirectoryContents
> There's not much we can do about non-Latin-1 ACP file paths
> hard coded in Strings. I hope there aren't too many
> of those in the wild.

The following cases are currently broken:

  * Calling openFile on a literal Unicode FilePath (note, not
    ACP-encoded, just Unicode).

  * Reading a Unicode FilePath from a text file and then calling
    openFile on it

I propose to fix these (on Windows).  It will mean that your second case 
above will be broken, until someone fixes getDirectoryContents.

Also currently broken:

  * calling removeFile on a FilePath you get from getDirectoryContents,
    amongst other System.Directory operations

Fixing getDirectoryContents will fix these.

I don't know how getArgs fits in here - should we be decoding argv using 
the ACP?


More information about the Haskell-Cafe mailing list