[Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a
Unicode ByteString
David Menendez
zednenem at psualum.com
Mon Feb 5 23:05:39 EST 2007
Alistair Bayley writes:
> On 05/02/07, Chris Kuklewicz <haskell at list.mightyreason.com> wrote:
> > UTF-8 is a 4 byte encoding. There is no valid UTF-8 5 or 6 byte
> > encoding.
>
> Chris is right here, in that Takusen's decoder is incorrect w.r.t. the
> standard, in allowing up to 6 bytes to encode a single char.
<snip>
> There's nothing stopping the Unicode consortium from expanding the
> range of codepoints, is there? Or have they said that'll never happen?
I believe they have. In particular, UTF-16 only supports code points up
to 10FFFF.
From <http://en.wikipedia.org/wiki/Universal_Character_Set>:
> the UCS stops at 10FFFF and ISO/IEC 10646 has stated that all future
> assignments of characters will also take place in that range
[...]
> ISO 10646 was limited to contain as many characters as could be
> encoded by UTF-16 and no more, that is, a little over a million
> characters instead of over 2,000 million
--
David Menendez <zednenem at psualum.com> | "In this house, we obey the laws
<http://www.eyrie.org/~zednenem> | of thermodynamics!"
More information about the Haskell
mailing list