String != [Char]

Gabriel Dos Reis gdr at
Mon Mar 26 15:22:23 CEST 2012

On Mon, Mar 26, 2012 at 7:29 AM, Christian Siefkes
<christian at> wrote:
> On 03/26/2012 01:26 PM, Gabriel Dos Reis wrote:
>> It is not the precision of Char or char that is the issue here.
>> It has been clarified at several points that Char is not a Unicode character,
>> but a Unicode code point.  Not every Unicode code point represents a
>> Unicode code character, and not every sequence of Unicode code points
>> represents a character or a sequence of Unicode character.
> What do you mean? Every Unicode character corresponds to one code point,

Yes, but this correspondence is not a bijection -- a great source of
confusion that
permeates lot of discussions about Unicode characters and texts,
including this one
(and a previous regarding the Haskell Report.)  Very much heart breaking :-(

> and
> every code point in the range 0 to 0x10FFFF (excluding the range 0xD800 to
> 0xDFFF which is reserved for surrogate pairs in UTF-16, and a handful of
> "noncharacters", see
> ) corresponds to one character.
> Maybe your criticism is that Char does not explicitly prevent these special
> code points from being assigned? While true, that seems a relatively minor
> matter. Moreover, a future revision of the Haskell standard could easily
> declare that a assigning a "forbidden" character results in an error/bottom
> if that is so desired.

It is not just a matter of clarification that certain things are
forbidden.   I believe
it would be a great mistake to qualify it as minor. How do you handle
if you expose the texts as sequence of unrelated code points that can be freely
taken apart and combined?

- Gaby

More information about the Haskell-prime mailing list