[Haskell-cafe] Re: File path programme

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Sun Jan 30 17:39:49 EST 2005


Aaron Denney <wnoise at ofb.net> writes:

>> It provides variants of UTF-16/32 with and without a BOM, but
>> UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful
>> encoding.
>
> I think you mean "UTF-8 only has the variant without a BOM".

No, unfortunately. Unicode standard section 3.10 defines encoding
schemes:

- UTF-8    (with    a BOM)
- UTF-16BE (without a BOM)
- UTF-16LE (without a BOM)
- UTF-16   (with    a BOM)
- UTF-32BE (without a BOM)
- UTF-32LE (without a BOM)
- UTF-32   (with    a BOM)

It says about UTF-8 BOM: "Its usage at the beginning of a UTF-8 data
stream is neither required nor recommended by the Unicode Standard,
but its presence does not affect conformance to the UTF-8 encoding
scheme."

IMHO it would be fair if it had two variants of UTF-8 encoding scheme,
just like it has three variants of UTF-16/32, so it would be unambiguous
whether "UTF-8" in a particular context allows BOM or not.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


More information about the Haskell-Cafe mailing list