Unicode support

Kent Karlsson kentk@md.chalmers.se
Mon, 8 Oct 2001 11:48:59 +0200


----- Original Message -----
From: "Dylan Thurston" <dpt@math.harvard.edu>
To: "John Meacham" <john@repetae.net>; <haskell@haskell.org>
Sent: Friday, October 05, 2001 5:47 PM
Subject: Re: Unicode support


> On Sun, Sep 30, 2001 at 11:01:38AM -0700, John Meacham wrote:
> > seeing as how the haskell standard is horribly vauge when it comes to
> > character set encodings anyway, I would recommend that we just omit any
> > reference to the bit size of Char, and just say abstractly that each
> > Char represents one unicode character, but the entire range of unicode
> > is not guarenteed to be expressable, which must be true, since haskell
> > 98 implementations can be written now, but unicode can change in the
> > future. The only range guarenteed to be expressable in any
> > representation are the values 0-127 US ASCII (or perhaps latin1)
>
> I agree about the vagueness, but I believe the Unicode consortium has
> explicitly limited itself to 21 bits; if they turn out to have been

In some sense yes, but not quite.  It's better to say that the code space
is from 0000 to 10FFFF, then the encoding forms handle the bits.

> lying about that (which seems unlikely in this millenium), we can

The guesstimate (originally) of less than half a millon "things" to encode
as characters has been stable for over a decade. Even though some
try to argue that Unicode had to go from 16-bit to more to be able
to handle more characters, that was really known from the beginning.
That there was a big bump recently adding 41000 Hàn characters that
was collected over a long time and, though some more Hàn are expected,
no such big bump.  If you're interested, it's gone beyond a guesstimate
now, see the roadmap:
http://www.evertype.com/standards/iso10646/ucs-roadmap.html
(the official version is at the DKUUG site, but the reference is through a
cryptic document number).  You will see how plane 1 is planned for
a number of historical scripts (mostly). Disregarding the private use
planes (15 and 16) there is nothing planned for planes 3-14, except for
some crap in 14 (what is there is there for political reasons only, DO NOT
USE), and that plane 2 may spill over into plane 3. That leaves ten planes
(of 64K code positions each) completely empty, with nothing planned for them.

        Kind regards
        /kent k


> hardly be blamed for believing them.  I think all that should be
> required of implementations is that they support 21 bits.
>
> Best,
> Dylan Thurston
>
> _______________________________________________
> Haskell mailing list
> Haskell@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell