[Haskell-cafe] Filename encoding error (was: Perform a research a la Unix 'find')

Daniel Fischer daniel.is.fischer at web.de
Sun Aug 22 15:00:29 EDT 2010

On Sunday 22 August 2010 19:23:03, Yves Parès wrote:
> In fact the encoding problem is more general.
> When I simply do 'readFile "bar/fooé"', then I'm told:
> *** Exception: bar/fooé: openFile: does not exist (No such file or
> directory)


ghci> readFile (Data.ByteString.Char8.unpack 
(Data.ByteString.UTF8.fromString "fooé"))

(same trick for find).

The problem is probably that readFile filePath truncates the characters in 
filePath to 8 bits while the filepath on your system is UTF-8 encoded, so 
you have to give a pseudo-UTF-8 encoded filepath to readFile.
At least, that's how it works here, inconvenient though it is.

> How am I supposed to read files whose names contains non-ASCII
> characters? (I use GHC 6.12.3 under Ubuntu 10.04 32bits)

While the inconvenience lasts (people are thinking about how to handle the 
situation correctly), avoid non-ASCII characters in filepaths if possible.

> My locale is fr_FR.utf8
> For instance, with HSH:
> I have a 'bar' directory, containing a file 'fooé'
> run $ "find bar" :: IO [String]
> returns me : ["bar", "bar/foo*\233*"]

That one is okay, 'é' is '\233' and the Show instance for Char escapes all 
characters > '\127'.

> and run $ "find bar -name fooé"
> returns []

Maybe the same issue, try

run $ "find bar -name foo\195\169"

> When I provoke an error by running:
> run $ "find fooé"
> it says :
> find: "foo*\351*": No file or directory

On the other hand, if it now says \351, which is ş, there seems to be 
something else amiss.

> So it is not the same encoding!

More information about the Haskell-Cafe mailing list