[Haskell] System.FilePath survey
Einar Karttunen
ekarttun at cs.helsinki.fi
Wed Feb 8 18:39:37 EST 2006
On 08.02 14:03, Wolfgang Thaller wrote:
> 1) Widely used languages and libraries like Java and GTK+ assume that
> all file names and command lines are encoded in the system locale, or
> at least that they can all be converted to unicode strings.
Which causes much annoyance to users having to define various
environment variables just to get them to open a file.
> 2) Command lines are usually entered as TEXT on a terminal and are
> therefore encoded in whatever encoding the terminal uses.
Actually I like the ablity to delete/copy files even if they
happen to have filenames in weird chinese encodings too.
Users just use wildcards or tab completion to get around
filenames that are hard to type.
> 3) None of the recent linux distributions I have installed did
> anything but set up a UTF-8 based system.
Very many people needing to use their own language still use
other things and will continue so for the foreseeable future.
> So I think we should try hard to avoid introducing any additional
> complexity, like filename ADTs used for program arguments, to deal
> with the small minority of systems where file names cannot be
> converted to unicode. Maybe it's possible to use some user-defined
> unicode code points to achieve a lossless conversion of arbitrary
> byte strings to unicode? I mean, byte strings that are valid in the
> system encoding would get transcoded correctly, and invalid bytes
> would get mapped to some extra code points so that they can be
> converted back if necessary.
What would happen if you tried to output such a String? The raw
bytes or the escaped versions? Also this would mean that
Haskell unicode != unicode (isn't Java's broken handling
enough).
- Einar Karttunen
More information about the Libraries
mailing list