[Haskell-cafe] Re: Unicode workaround for getDirectoryContents under Windows?

Bulat Ziganshin bulat.ziganshin at gmail.com
Tue Jun 16 09:56:30 EDT 2009


Hello Simon,

Tuesday, June 16, 2009, 5:02:43 PM, you wrote:

> Also currently broken:

>   * calling removeFile on a FilePath you get from getDirectoryContents,
>     amongst other System.Directory operations

> Fixing getDirectoryContents will fix these.

no. removeFile like anything else also uses ACP-based api

> I don't know how getArgs fits in here - should we be decoding argv using
> the ACP?

well, the whole story: windows internally uses Unicode for handling
strings. externally, it provides 2 API families:

1) A-family (such as CreateFileA) uses 8-bit char-based strings.
these strings are encoded using current locale. First 128 chars are
common for all codepages, providing ASCII char set, higher 128 chars
are locale-specific. say, for German locale, it provides chars with
umlauts, for Russian locale - cyrillic chars

2) W-family (such as CreateFileW) uses UTF-16 encoded 16-bit
wchar-based strings, which are locale-independent


Windows libraries emulates POSIX API (open, opendir, stat and so on)
by translating these (char-based) calls into A-family. GHC libs are
written Unix way, so these are effectively bundled to A-family of Win
API

Windows libraries also provides w* variant of POSIX API (wopen,
wopendir, wstat...) that uses UTF-16 encoded 16-bit wchar-based
strings, so for proper handling of Unicode strings (filenames, cmdline
arguments) we should use these APIs


my old proposal: http://haskell.org/haskellwiki/Library/IO



-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com



More information about the Haskell-Cafe mailing list