Unicode support

Jens Petersen petersen@redhat.com
30 Sep 2001 22:28:52 +0900


Hamilton Richards <hrichrds@swbell.net> writes:

> At 12:20 PM -0500 9/29/01, Colin Paul Adams wrote:
> >I have just been reading through the Haskell report to refresh my
> >memory of the language. I was surprised to see this:
> >
> >The character type Char is an enumeration and consists of 16 bit values,
> >conforming to
> >the Unicode standard [10].
> >
> >Unicode uses 24-bit values to identify characters.
> 
> According to the official Unicode web site [0],
> 
> 	The Unicode Standard defines three encoding forms
> 	that allow the same data to be transmitted in a byte,
> 	word or double word oriented format (i.e. in 8, 16 or
> 	32-bits per code unit).
> 
> [0] http://www.unicode.org/unicode/standard/principles.html

You have to distinguish between encodings (you refer to
utf-8, utf-16 and utf-32) and the unicode (iso-10646) tables
of codepoints themselves.

16 bits is enough to describe the Basic Multilingual Plane
and I think 24 bits all the currently defined extended
planes.  So I guess the report just refers to the BMP.

Jens