String != [Char]

Brandon Allbery allbery.b at gmail.com
Mon Mar 26 19:12:18 CEST 2012


On Mon, Mar 26, 2012 at 06:08, Christian Siefkes <christian at siefkes.net>wrote:

> On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:
> > True, but should the language definition default to a string type
> > that is one the most unsuited for text processing in the 21st
> > century where global multilingualism abounds?  Even C has qualms
> > about that.
> ...
> > I have no doubt believing that if all texts my students have to
> > process are US ASCII, [Char] is more than sufficient.  So, I have
> > sympathy for your position.  However,  I doubt [Char] would be
> > adequate if I ask them to shared texts from their diverse cultures.
>
> Uh, while a C char is (usually) just a byte (2^8 bits of information, like
> Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of
> information). A single C char cannot contain arbitrary Unicode character,
> while a Haskell Char can, and does. Hence [Char] is (efficiency issues
> aside) perfectly adequate for dealing with texts written in arbitrary
> languages.
>

...as long as you ignore combining characters and the like.  I claim
ignoring them in this way is just continuing the same "good enough for my
language" attitude that has plagued text handling ever since someone got
the notion that maybe text processing should consider more than just ISO
8859/1 and got roundly pooh-poohed by the community.

-- 
brandon s allbery                                      allbery.b at gmail.com
wandering unix systems administrator (available)     (412) 475-9364 vm/sms
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-prime/attachments/20120326/811094e7/attachment-0001.htm>


More information about the Haskell-prime mailing list