[Haskell-cafe] Re: Unicode workaround for getDirectoryContents
under Windows?
Bulat Ziganshin
bulat.ziganshin at gmail.com
Tue Jun 16 09:56:30 EDT 2009
Hello Simon,
Tuesday, June 16, 2009, 5:02:43 PM, you wrote:
> Also currently broken:
> * calling removeFile on a FilePath you get from getDirectoryContents,
> amongst other System.Directory operations
> Fixing getDirectoryContents will fix these.
no. removeFile like anything else also uses ACP-based api
> I don't know how getArgs fits in here - should we be decoding argv using
> the ACP?
well, the whole story: windows internally uses Unicode for handling
strings. externally, it provides 2 API families:
1) A-family (such as CreateFileA) uses 8-bit char-based strings.
these strings are encoded using current locale. First 128 chars are
common for all codepages, providing ASCII char set, higher 128 chars
are locale-specific. say, for German locale, it provides chars with
umlauts, for Russian locale - cyrillic chars
2) W-family (such as CreateFileW) uses UTF-16 encoded 16-bit
wchar-based strings, which are locale-independent
Windows libraries emulates POSIX API (open, opendir, stat and so on)
by translating these (char-based) calls into A-family. GHC libs are
written Unix way, so these are effectively bundled to A-family of Win
API
Windows libraries also provides w* variant of POSIX API (wopen,
wopendir, wstat...) that uses UTF-16 encoded 16-bit wchar-based
strings, so for proper handling of Unicode strings (filenames, cmdline
arguments) we should use these APIs
my old proposal: http://haskell.org/haskellwiki/Library/IO
--
Best regards,
Bulat mailto:Bulat.Ziganshin at gmail.com
More information about the Haskell-Cafe
mailing list