String != [Char]

Johan Tibell johan.tibell at gmail.com
Mon Mar 26 18:58:24 CEST 2012


On Mon, Mar 26, 2012 at 9:42 AM, Christian Siefkes
<christian at siefkes.net> wrote:
> On 03/26/2012 05:50 PM, Johan Tibell wrote:
>> Normalization isn't quite enough unfortunately, as it does solve e.g.
>>
>>     upcase = map toUppper
>>
>> You need all-at-once functions on strings (which we could add.) I'm
>> just pointing out that most (all?) list functions do the wrong thing
>> when used on Strings.
>
> Hm, do you have any other examples besides toUpper/toLower?

length, cons, head, tail, filter, folds, anything that works on an
element-by-element basis.

> Also, that example is not really an argument against using list functions on
> strings (which, by any reasonable definition, seem to be "sequences of
> characters" -- whether that sequence is represented as a list, an array, or
> something else, seems more like an implementation detail to me).

I agree on the second part. As someone pointed out earlier, we should
be careful in using the word character as the Unicode code point
doesn't correspond well to the commonly used concept of a character.
What we have today is really:

    type String = [CodePoint]

What you would normally think of as a character might consists of
several code points.

> Rather, it
> indicates the fact that Char.toUpper may have to wrong type. If its type was
> Char -> String instead of Char -> Char, it could handle things like toUppper
> 'ß' == "SS" correctly. Then stuff like
>
>        upcase = concatMap toUppper
>
> would work fine.

Yes.

> As it is, the problem seems to be with Char, not with [Char].

[Char] is a semantically OK representation of a Unicode string, using
an array like text does is simply an optimization. However, using the
list function defined by the Prelude is not a good idea if you want to
process a Unicode string correctly.

-- Johan



More information about the Haskell-prime mailing list