String != [Char]
Gabriel Dos Reis
gdr at integrable-solutions.net
Mon Mar 26 13:26:47 CEST 2012
On Mon, Mar 26, 2012 at 5:08 AM, Christian Siefkes
<christian at siefkes.net> wrote:
> On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:
>> True, but should the language definition default to a string type
>> that is one the most unsuited for text processing in the 21st
>> century where global multilingualism abounds? Even C has qualms
>> about that.
>> I have no doubt believing that if all texts my students have to
>> process are US ASCII, [Char] is more than sufficient. So, I have
>> sympathy for your position. However, I doubt [Char] would be
>> adequate if I ask them to shared texts from their diverse cultures.
> Uh, while a C char is (usually) just a byte (2^8 bits of information, like
> Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of
It is not the precision of Char or char that is the issue here.
It has been clarified at several points that Char is not a Unicode character,
but a Unicode code point. Not every Unicode code point represents a
Unicode code character, and not every sequence of Unicode code points
represents a character or a sequence of Unicode character.
> A single C char cannot contain arbitrary Unicode character,
> while a Haskell Char can, and does. Hence [Char] is (efficiency issues
> aside) perfectly adequate for dealing with texts written in arbitrary languages.
More information about the Haskell-prime