[Haskell-cafe] Encoding-aware System.Directory functions

Jason Dagit dagitj at gmail.com
Wed Mar 30 09:26:18 CEST 2011


On Tue, Mar 29, 2011 at 11:52 PM, Michael Snoyman <michael at snoyman.com>wrote:

> Hi all,
>
> I think this is a well-known issue: it seems that there is no
> character decoding performed on the values returned from the functions
> in System.Directory (getDirectoryContents specifically). I could
> manually do something like (utf8Decode . S8.pack), but that presumes
> that the character encoding on the system in question is UTF8. So two
> questions:
>
> * Is there a package out there that handles all the gory details for
> me automatically, and simply returns a properly decoded String (or
> Text)?
> * If not, is there a standard way to determine the character encoding
> used by the filesystem, short of hard-coding in character encodings
> used by the major ones?
>

I started to write a thoughtful reply, but I found that the answers here sum
up everything I was going to say:
http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux

This same issue comes up from time to time for darcs and, if I recall
correctly, the solution has been to treat unix file paths as arbitrary bytes
whenever possible and to escape non-ascii compatible bytes when they occur.
 Otherwise it can be hard to encode them in textual patch descriptions or
xml (where an encoding is required and I believe utf8 is a standard
default).

I wish you luck.  It's not as easy problem, at least on unix.  I've heard
that windows has a much easier time here as MS has provided a standard for
it.

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110330/84c3309a/attachment.htm>


More information about the Haskell-Cafe mailing list