UTF-8 BOM, really!? (was: [Haskell-cafe] Re: File path programme)

Graham Klyne GK at ninebynine.org
Mon Jan 31 04:56:14 EST 2005

At 23:39 30/01/05 +0100, Marcin 'Qrczak' Kowalczyk wrote:
>Aaron Denney <wnoise at ofb.net> writes:
> >> It provides variants of UTF-16/32 with and without a BOM, but
> >> UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful
> >> encoding.
> >
> > I think you mean "UTF-8 only has the variant without a BOM".
>No, unfortunately. Unicode standard section 3.10 defines encoding
>- UTF-8    (with    a BOM)
>- UTF-16BE (without a BOM)
>- UTF-16LE (without a BOM)
>- UTF-16   (with    a BOM)
>- UTF-32BE (without a BOM)
>- UTF-32LE (without a BOM)
>- UTF-32   (with    a BOM)
>It says about UTF-8 BOM: "Its usage at the beginning of a UTF-8 data
>stream is neither required nor recommended by the Unicode Standard,
>but its presence does not affect conformance to the UTF-8 encoding
>IMHO it would be fair if it had two variants of UTF-8 encoding scheme,
>just like it has three variants of UTF-16/32, so it would be unambiguous
>whether "UTF-8" in a particular context allows BOM or not.

I haven't been following this thread in detail, so I may be missing 
something, but...

How can it make sense to have a BOM in UTF-8?  UTF-8 is a sequence of 
octets (bytes);  what ordering is there here that can sensibly be varied?


Graham Klyne
For email:

More information about the Haskell-Cafe mailing list