String != [Char]
Thomas Schilling
nominolo at googlemail.com
Mon Mar 26 14:40:20 CEST 2012
On 26 March 2012 13:29, Christian Siefkes <christian at siefkes.net> wrote:
> On 03/26/2012 01:26 PM, Gabriel Dos Reis wrote:
>> It is not the precision of Char or char that is the issue here.
>> It has been clarified at several points that Char is not a Unicode character,
>> but a Unicode code point. Not every Unicode code point represents a
>> Unicode code character, and not every sequence of Unicode code points
>> represents a character or a sequence of Unicode character.
>
> What do you mean? Every Unicode character corresponds to one code point, and
> every code point in the range 0 to 0x10FFFF (excluding the range 0xD800 to
> 0xDFFF which is reserved for surrogate pairs in UTF-16, and a handful of
> "noncharacters", see
> http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Special_code_points
> ) corresponds to one character.
I think it's best not to use the term "Unicode character" since it's
highly ambiguous, to quote from
http://www.icu-project.org/docs/papers/forms_of_unicode/:
"""
We have seen that characters, glyphs, code points, and code units are
all different. Unfortunately the term character is vastly overloaded.
At various times people can use it to mean any of these things:
- An image on paper (glyph)
- What an end-user thinks of as a character (grapheme)
- What a character encoding standard encodes (code point)
- A memory storage unit in a character encoding (code unit)
Because of this, ironically, it is best to avoid the use of the term
character entirely when discussing character encodings, and stick to
the term code point.
"""
The section http://www.icu-project.org/docs/papers/forms_of_unicode/#h0
is also important to keep in mind.
>
> Maybe your criticism is that Char does not explicitly prevent these special
> code points from being assigned? While true, that seems a relatively minor
> matter. Moreover, a future revision of the Haskell standard could easily
> declare that a assigning a "forbidden" character results in an error/bottom
> if that is so desired.
>
> Best regards
> Christian
>
> --
> |------- Dr. Christian Siefkes ------- christian at siefkes.net -------
> | Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
> | Peer Production Everywhere: http://peerconomy.org/wiki/
> |---------------------------------- OpenPGP Key ID: 0x346452D8 --
> Linux is like living in a tipi: no windows, no gates, Apache inside.
>
>
> _______________________________________________
> Haskell-prime mailing list
> Haskell-prime at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-prime
>
--
Push the envelope. Watch it bend.
More information about the Haskell-prime
mailing list