Unicode support
Ashley Yakeley
ashley@semantic.org
Sun, 30 Sep 2001 15:36:51 -0700
At 2001-09-30 07:29, Marcin 'Qrczak' Kowalczyk wrote:
>Some time ago the Unicode Consortium slowly began switching to the
>point of view that abstract characters are denoted by numbers in the
>range U+0000..10FFFF.
It's worth mentioning that these are 'codepoints', not 'characters'.
Sometimes a character will be made up of two codepoints, for instance an
'a' with a dot above is a single character that can be made from the
codepoints LATIN SMALL LETTER A and COMBINING DOT ABOVE. Perhaps this
makes the UTF-16 'surrogate' problem a bit less serious, since there
never was a one-to-one correspondence between any kind of n-bit unit and
displayed characters.
--
Ashley Yakeley, Seattle WA