Unicode support

John Meacham john@repetae.net
Sun, 30 Sep 2001 11:01:38 -0700

sorry for the me too post, but this has been a major pet peeve of mine
for a long time. 16 bit unicode should be gotten rid of, being the worst
of both worlds, non backwards compatable with ascii, endianness issues
and no constant length encoding.... utf8 externally and utf32 when
worknig with individual characters is the way to go.

seeing as how the haskell standard is horribly vauge when it comes to
character set encodings anyway, I would recommend that we just omit any
reference to the bit size of Char, and just say abstractly that each
Char represents one unicode character, but the entire range of unicode
is not guarenteed to be expressable, which must be true, since haskell
98 implementations can be written now, but unicode can change in the
future. The only range guarenteed to be expressable in any
representation are the values 0-127 US ASCII (or perhaps latin1)

On Sun, Sep 30, 2001 at 02:29:40PM +0000, Marcin 'Qrczak' Kowalczyk wrote:
> IMHO it would have been better to not invent UTF-16 at all and use
> UTF-8 in parallel with UTF-32. But Unicode used to promote UTF-16 as
> the real Unicode, and now it causes so many threads on Unicode list
> to clear the confusion about the nature of characters above U+FFFF.

John Meacham - California Institute of Technology, Alum. - john@repetae.net