Raw filenames vs locales

David Roundy droundy at abridgegame.org
Sat Jul 30 14:40:23 EDT 2005


On Sat, Jul 30, 2005 at 06:13:21PM +0200, Udo Stenzel wrote:
> Ian Lynagh wrote:
> > With it's closer adherence to the Haskell 98 report, it is no longer
> > possible with hugs to manipulate files using the standard IO functions
> > if the filenames are not representable in your locale.
> 
> Note that this basically means your filesystem is broken.  This
> situation can only occur if a filesystem is written in one and then read
> in another locale. [...]

That is true, but on any multiuser system it's quite a reasonable scenario
to have different users using different locales.  It's an embarrassing
scenario that I can't write a tool in Haskell that recursively deletes a
directory in which there are files that aren't representable in my current
locale... or display the contents of such files, or anything else.

> This "problem" cannot really be fixed, only worked around.

On the contrary, the problem *can* be fixed, by only requiring that
filenames be converted to unicode if necesary.  For many purposes (possibly
even *most* purposes), knowledge of the character encoding is completely
unnecesary.

More to the point, the "problem" is inherent in the langage, not the
filesystem--or perhaps you'd prefer to say that it's a problem with writing
portable code.  The point is that it would seem best to present an API
which makes it possible to write portable code.  On POSIX filesystems
filenames are not sequences of unicode characters, and treating them as
such causes trouble.

> > UTF-8:       65533 = U+FFFD = "replacement character"
> > 
> > =================
> > Proposed solution
> > =================
> 
> I have a simpler proposal: allocate 128 "replacement characters" in the
> "Vendor Zone" of Unicode.  Their purpose is as place holders for
> incorrect UTF8.  Then use these replacement characters when decoding
> UTF8 and reproduce the original, broken, code when re-encoding.  Under
> ordinary circumstances these codes should never occur in strings.

I guess you'd then want a couple of functions in the IO monad to convert
between FilePath and CString (or something we could actually use)?

While your suggestion would solve the problem of being unable to access
some files, it would also result in FilePaths themselves (without
conversion routines) being useless for purposes other than actually
accessing the same files.
-- 
David Roundy
http://www.darcs.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.haskell.org//pipermail/libraries/attachments/20050730/7a56514e/attachment-0001.bin


More information about the Libraries mailing list