encoding and paths, again

Simon Marlow marlowsd at gmail.com
Wed Nov 16 10:23:40 CET 2011


On 16/11/2011 08:09, Ganesh Sittampalam wrote:
> On 14/11/2011 14:47, Simon Marlow wrote:
>
>>> (1) Does Win32 need similar additions? I can't spot any substantial
>>> changes to it for Max's PEP383, but I'm not sure if any lower-level
>>> library changes might have affected it.
>>
>> No - Win32 file paths cannot by definition contain invalid Unicode.  The
>> existing Win32 library is fine.
>>
>>> (2) What's the recommended way of doing the equivalent of
>>> getDirectoryContents for RawFilePath? Do we also need to add "raw"
>>> versins to the directory package?
>>
>> getDirectoryContents :: RawFilePath ->  IO [RawFilePath]
> [...]
>
> Thanks. One followup - in the Win32 case (where I guess we can still use
> the normal getDirectoryContents and get a FilePath), is it still
> necessary to re-encode the results to guarantee independence from the
> current settings (e.g. as proposed by Max in
> http://www.haskell.org/pipermail/glasgow-haskell-users/2011-November/021116.html),
> or do we just always get the original filename properly because of the
> way Windows handles paths?

I think Max's answer above applies when you know that file paths on the 
disk are stored in a different encoding from the locale.  This doesn't 
apply to Win32, where file paths are always UTF-16, with the encoding 
and decoding handled by the Win32 layer.

In fact, if Max goes ahead and adds setFilesystemEncoding and 
setLocaleEncoding as he suggested, then this will get easier: you can 
just set the encoding to whatever you want before doing any file system 
operations.

> Sorry if all this is obvious but every time I think I understand Unicode
> I get proven wrong!

I know the feeling :-(

Cheers,
	Simon




More information about the Glasgow-haskell-users mailing list