Unicode support in Hugs - alpha-patch available
Richard A. O'Keefe
ok@cs.otago.ac.nz
Mon, 25 Aug 2003 14:38:33 +1200 (NZST)
| Anyone interested in Unicode support in Hugs (what it lacks so far)
| please check out this URL:
|
| http://www.golubovsky.org/software/hugs-patch/article.html
There was a serious error in that page, which I can no longer reach.
The number of defined characters in Unicode was grossly underestimated,
because the number of *lines* in the Unicode database was counted,
instead of the number of *characters*.
Here's a summary of UnicodeData.txt as it stood in January 2001.
6582 <CJK Ideograph Extension A>
20902 <CJK Ideograph>
11172 <Hangul Syllable>
896 <Non Private Use High Surrogate>
128 <Private Use High Surrogate>
1024 <Low Surrogate>
6400 <Private Use>
65534 <Plane 15 Private Use>
65534 <Plane 16 Private Use>
10603 singletons
10621 records, all told.
The number of *lines* is 10,621; the number of *characters*
(excluding surrogates and private use areas) is 49,259.
Since then, Unicode has got a lot bigger.
The current version of Unicode is 4.0.0. According to
http://www.unicode.org/versions/Unicode4.0.0/
it has 96,248{%} graphic characters plus 265 format, control, and
"noncharacter" characters (not counting private use, surrogate,
and reserved codes).
This is a _lot_ more than the www.golubovsky.org article envisaged.
{%} Yes, this does mean that "Unicode is 16 bits" is dead. That idea
has been dead for quite some time now, although the rotting corpse
"lives" on in Java.