[Haskell-cafe] Encoding-aware System.Directory functions

Tako Schotanus tako at codejive.org
Wed Mar 30 11:20:30 CEST 2011


On Wed, Mar 30, 2011 at 11:01, Alistair Bayley <alistair at abayley.org> wrote:

> On 30 March 2011 20:53, Max Bolingbroke <batterseapower at hotmail.com>wrote:
>
>> On 30 March 2011 07:52, Michael Snoyman <michael at snoyman.com> wrote:
>> > I could
>> > manually do something like (utf8Decode . S8.pack), but that presumes
>> > that the character encoding on the system in question is UTF8. So two
>> > questions:
>>
>> Funnily enough I have been thinking about this quite hard recently,
>> and the situation is kind of a mess and short of implementing PEP383
>> (http://www.python.org/dev/peps/pep-0383/) in GHC I can't see how to
>> make it easier on the programmer. As Jason points out the best you can
>> really do is probably:
>>
>>  1. Treat Strings that represent filenames as raw byte sequences, even
>> though they claim to be strings
>>
>>  2. When presenting such Strings to the user, re-decode them by using
>> the current locale encoding (which will typically be UTF-8). You
>> probably want to have some means of avoiding decoding errors here too
>> -- ignoring or replacing undecodable bytes -- but presently this is
>> not so straightforward. If you happen to be on a system with GNU Iconv
>> you can use it's "C//TRANSLIT//IGNORE" encoding to achieve this,
>> however.
>>
>
>
> http://www.haskell.org/pipermail/libraries/2009-August/012493.html
>
> I took from this discussion that FilePath really should be a pair of the
> actual filename ByteString, and the printable String (decoded from the
> ByteString, with encoding specified by the user's locale). The conversion
> from ByteString to String (and vice versa) is not guaranteed to be lossless,
> so you need to remember both.
>
>
I'm not sure that  I agree with that. Why does it have to be loss-less?
The problem, more likely, is the fact that FilePath is just a simple string.
Maybe we should go the way of Java where cross-platform file access is based
upon a File (or the new Path) type? That way the internal representation
could use whatever necessary to ensure a unique reference to a file or
directory while at the same time providing a way to get a human-readable
representation.
Going from strings to file/path types would need the correct encodings to
work.

Cheers,
 -Tako

PS: Just lurking here most of the time because I'm still a total Haskell
noob, you can ignore me without risk.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110330/5c634c82/attachment.htm>


More information about the Haskell-Cafe mailing list