behaviour change in getDirectoryContents in GHC 7.2?

Max Bolingbroke batterseapower at hotmail.com
Tue Nov 1 11:23:14 CET 2011


Hi Ganesh,

On 1 November 2011 07:16, Ganesh Sittampalam <ganesh at earth.li> wrote:
> Can anyone point me at the rationale and details of the change and/or
> suggest workarounds?

This is my implementation of Python's PEP 383 [1] for Haskell.

IMHO this behaviour is much closer to what users expect.For example,
getDirectoryContents "." >>= print shows Unicode filenames properly.
As a result of this change we were able to close quite a few
outstanding GHC bugs.

PEP-383 behaviour always does the right thing on setups with a
consistent text encoding for filenames, command line arguments and the
like (Windows, or *nix where the system locale is e.g. UTF-8 and all
filenames are encoded in that locale). However, there are legitimate
use cases where the program has more information about how something
is encoded than just the system locale, and in those cases you should
*encode* the String from getDirectoryContents using
GHC.IO.Encoding.fileSystemEncoding and then *decode* it with your
preferred TextEncoding. In your case I think you want
GHC.IO.Encoding.latin1.

You can use a helper function like this to make this easier:

reencode :: TextEncoding -> TextEncoding -> String -> String
reencode from_enc to_enc from = unsafeLocalState $
GHC.Foreign.withCStringLen from_enc (GHC.Foreign.peekCStringLen
to_enc)

Hope that helps,

Max


[1] http://www.python.org/dev/peps/pep-0383/



More information about the Glasgow-haskell-users mailing list