[Haskell-cafe] Encoding-aware System.Directory functions

Wed Mar 30 09:52:33 CEST 2011

On Wed, Mar 30, 2011 at 09:26, Jason Dagit <dagitj at gmail.com> wrote:

>
>
> On Tue, Mar 29, 2011 at 11:52 PM, Michael Snoyman <michael at snoyman.com>wrote:
>
>> Hi all,
>>
>> I think this is a well-known issue: it seems that there is no
>> character decoding performed on the values returned from the functions
>> in System.Directory (getDirectoryContents specifically). I could
>> manually do something like (utf8Decode . S8.pack), but that presumes
>> that the character encoding on the system in question is UTF8. So two
>> questions:
>>
>> * Is there a package out there that handles all the gory details for
>> me automatically, and simply returns a properly decoded String (or
>> Text)?
>> * If not, is there a standard way to determine the character encoding
>> used by the filesystem, short of hard-coding in character encodings
>> used by the major ones?
>>
>
> I started to write a thoughtful reply, but I found that the answers here
> sum up everything I was going to say:
>
> http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux
>
> This same issue comes up from time to time for darcs and, if I recall
> correctly, the solution has been to treat unix file paths as arbitrary bytes
> whenever possible and to escape non-ascii compatible bytes when they occur.
>  Otherwise it can be hard to encode them in textual patch descriptions or
> xml (where an encoding is required and I believe utf8 is a standard
> default).
>
> I wish you luck.  It's not as easy problem, at least on unix.  I've heard
> that windows has a much easier time here as MS has provided a standard for
> it.
>

All the more reason it seems to make this available in the standard package,
so people don't have to figure out how to the conversions each time (for all
the different OSes with whcih they might not have any experience etc) .

All modern Linuxes use UTF8 by default anyway so in the beginning one could
assume UTF8 and later change the system to be able to make more intelligent
decisions (like checking environment variables for per-user settings). A way
to override the assumptions made would be necessary too I guess.

-Tako
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110330/af6a9261/attachment.htm>