[Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

David Menendez zednenem at psualum.com
Mon Feb 5 23:05:39 EST 2007


Alistair Bayley writes:

> On 05/02/07, Chris Kuklewicz <haskell at list.mightyreason.com> wrote:

> > UTF-8 is a 4 byte encoding.  There is no valid UTF-8 5 or 6 byte
> > encoding.
> 
> Chris is right here, in that Takusen's decoder is incorrect w.r.t. the
> standard, in allowing up to 6 bytes to encode a single char.

<snip> 

> There's nothing stopping the Unicode consortium from expanding the
> range of codepoints, is there? Or have they said that'll never happen?

I believe they have. In particular, UTF-16 only supports code points up
to 10FFFF.

From <http://en.wikipedia.org/wiki/Universal_Character_Set>:

> the UCS stops at 10FFFF and ISO/IEC 10646 has stated that all future
> assignments of characters will also take place in that range
[...]
> ISO 10646 was limited to contain as many characters as could be
> encoded by UTF-16 and no more, that is, a little over a million
> characters instead of over 2,000 million
-- 
David Menendez <zednenem at psualum.com> | "In this house, we obey the laws
<http://www.eyrie.org/~zednenem>      |        of thermodynamics!"


More information about the Haskell mailing list