30 Sep 2001 22:28:52 +0900
Hamilton Richards <email@example.com> writes:
> At 12:20 PM -0500 9/29/01, Colin Paul Adams wrote:
> >I have just been reading through the Haskell report to refresh my
> >memory of the language. I was surprised to see this:
> >The character type Char is an enumeration and consists of 16 bit values,
> >conforming to
> >the Unicode standard .
> >Unicode uses 24-bit values to identify characters.
> According to the official Unicode web site ,
> The Unicode Standard defines three encoding forms
> that allow the same data to be transmitted in a byte,
> word or double word oriented format (i.e. in 8, 16 or
> 32-bits per code unit).
>  http://www.unicode.org/unicode/standard/principles.html
You have to distinguish between encodings (you refer to
utf-8, utf-16 and utf-32) and the unicode (iso-10646) tables
of codepoints themselves.
16 bits is enough to describe the Basic Multilingual Plane
and I think 24 bits all the currently defined extended
planes. So I guess the report just refers to the BMP.