[Haskell-cafe] File path programme

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Sun Jan 30 15:30:18 EST 2005


Glynn Clements <glynn at gclements.plus.com> writes:

> And it isn't a theoretical issue. E.g. in an environment where EUC-JP
> is used, filenames may begin with <ESC>$)B (designate JISX0208 to G1),
> or they may not (because G1 is assumed to contain JISX0208 initally).

I think such encodings are never used as default encodings of a Unix
locale.

>> The various UTF encodings do not have this particular problem; if a UTF 
>> string is valid, then it is a unique representation of a unicode string.

BOM is a problem. Unfortunately Unicode mandates that FEFF at the
start of a UTF-8 text stream is a mark which doesn't belong to the
text. It provides variants of UTF-16/32 with and without a BOM, but
UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful
encoding.

Unix ignores this, it doesn't use BOM in UTF-8 except individual
applications for individual file formats. iconv() on Linux and
in libiconv don't process a BOM in UTF-8 (although in libiconv this
is because it's old, basing on and old RFC with 31-bit code points
which didn't include a BOM).

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


More information about the Haskell-Cafe mailing list