Unicode support

Ashley Yakeley ashley@semantic.org
Sun, 30 Sep 2001 15:36:51 -0700


At 2001-09-30 07:29, Marcin 'Qrczak' Kowalczyk wrote:

>Some time ago the Unicode Consortium slowly began switching to the
>point of view that abstract characters are denoted by numbers in the
>range U+0000..10FFFF.

It's worth mentioning that these are 'codepoints', not 'characters'. 
Sometimes a character will be made up of two codepoints, for instance an 
'a' with a dot above is a single character that can be made from the 
codepoints LATIN SMALL LETTER A and COMBINING DOT ABOVE. Perhaps this 
makes the UTF-16 'surrogate' problem a bit less serious, since there 
never was a one-to-one correspondence between any kind of n-bit unit and 
displayed characters.

-- 
Ashley Yakeley, Seattle WA