[Haskell-cafe] File path programme
Marcin 'Qrczak' Kowalczyk
qrczak at knm.org.pl
Sun Jan 30 15:30:18 EST 2005
Glynn Clements <glynn at gclements.plus.com> writes:
> And it isn't a theoretical issue. E.g. in an environment where EUC-JP
> is used, filenames may begin with <ESC>$)B (designate JISX0208 to G1),
> or they may not (because G1 is assumed to contain JISX0208 initally).
I think such encodings are never used as default encodings of a Unix
locale.
>> The various UTF encodings do not have this particular problem; if a UTF
>> string is valid, then it is a unique representation of a unicode string.
BOM is a problem. Unfortunately Unicode mandates that FEFF at the
start of a UTF-8 text stream is a mark which doesn't belong to the
text. It provides variants of UTF-16/32 with and without a BOM, but
UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful
encoding.
Unix ignores this, it doesn't use BOM in UTF-8 except individual
applications for individual file formats. iconv() on Linux and
in libiconv don't process a BOM in UTF-8 (although in libiconv this
is because it's old, basing on and old RFC with 31-bit code points
which didn't include a BOM).
--
__("< Marcin Kowalczyk
\__/ qrczak at knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
More information about the Haskell-Cafe
mailing list