Unicode support in Hugs - alpha-patch available

Richard A. O'Keefe ok@cs.otago.ac.nz
Mon, 25 Aug 2003 14:38:33 +1200 (NZST)


	| Anyone interested in Unicode support in Hugs (what it lacks so far)
	| please check out this URL:
	|
	| http://www.golubovsky.org/software/hugs-patch/article.html
	
There was a serious error in that page, which I can no longer reach.
The number of defined characters in Unicode was grossly underestimated,
because the number of *lines* in the Unicode database was counted,
instead of the number of *characters*.

Here's a summary of UnicodeData.txt as it stood in January 2001.

 6582 <CJK Ideograph Extension A>
20902 <CJK Ideograph>
11172 <Hangul Syllable>
  896 <Non Private Use High Surrogate>
  128 <Private Use High Surrogate>
 1024 <Low Surrogate>
 6400 <Private Use>
65534 <Plane 15 Private Use>
65534 <Plane 16 Private Use>
10603 singletons
10621 records, all told.

The number of *lines* is 10,621; the number of *characters*
(excluding surrogates and private use areas) is 49,259.
Since then, Unicode has got a lot bigger.

The current version of Unicode is 4.0.0.  According to
http://www.unicode.org/versions/Unicode4.0.0/
it has 96,248{%} graphic characters plus 265 format, control, and
"noncharacter" characters (not counting private use, surrogate,
and reserved codes).

This is a _lot_ more than the www.golubovsky.org article envisaged.

{%} Yes, this does mean that "Unicode is 16 bits" is dead.  That idea
has been dead for quite some time now, although the rotting corpse
"lives" on in Java.