[Haskell-cafe] File name encodings

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Wed Dec 10 07:39:00 EST 2008


On Tue, 2008-12-09 at 18:17 -0800, Don Stewart wrote:
> Oh, perhaps you want to 'decode' the string that 
> dirOpenDialog returns.
> 
> redcom:
> > Hi Don,
> > 
> > must be doing something wrong.
> > 
> > The messed up string originates from calling Graphics.UI.WX.dirOpenDialog  
> > and selecting a directory with Umlauts.

This is such a huge can of worms.

The Gtk open dialog has two functions for returning the selected file
name. One returns a string suitable to use with operating system
functions like readFile while the other returns a unicode string
suitable to display in the user interface.

These need not be the same, or even inter-convertible. On Windows they
are identical because it uses unicode for file names, however unix uses
byte strings and people sometimes use utf8 and sometimes some other
locale.

So it's not safe to convert a file name to a unicode string and then
back again and expect to be saving the same file. Document editor
programs typically remember both strings so that it can save the file
again even if displaying the file name was lossy (eg due to locale
conversion errors like invalid utf8).

Yet another reason why FilePath /= String (except on Windows where it
does).

Duncan



More information about the Haskell-Cafe mailing list