Why are strings linked lists?
Glynn Clements
glynn.clements at virgin.net
Sat Nov 29 10:42:39 EST 2003
ketil+haskell at ii.uib.no wrote:
> >> What Unicode support?
>
> >> Simply claiming that values of type Char are Unicode characters
> >> doesn't make it so.
>
> > Just because some implementations lack toUpper etc. doesn't mean
> > they all do.
>
> I think the point is that for toUpper etc to be properly Unicoded,
> they can't simply look at a single character. IIRC, there are some
> characters that expand to two characters when the case is changed, and
> then there's titlecase and so on.
If that was the extent of the problems, I wouldn't be describing
Unicode support as "non-existent".
Note that ANSI C9X doesn't handle the first problem either:
7.25.3.1.1 The towlower function
#include <wctype.h>
wint_t towlower(wint_t wc);
7.25.3.1.2 The towupper function
#include <wctype.h>
wint_t towupper(wint_t wc);
And it only handles the second problemm (title case) insofar that it
provides a generic transformation mechanism:
7.25.3.2 Extensible wide-character case mapping functions
[#1] The functions wctrans and towctrans provide extensible
wide-character mapping as well as case mapping equivalent to
that performed by the functions described in the previous
subclause (7.25.3.1).
7.25.3.2.1 The towctrans function
#include <wctype.h>
wint_t towctrans(wint_t wc, wctrans_t desc);
7.25.3.2.2 The wctrans function
#include <wctype.h>
wctrans_t wctrans(const char *property);
Whilst a title-case transformer is the most obvious application of
this, nothing in the standard specifies this.
> toUpper etc. are AFAIK only implemented correctly for a small (but
> IMHO probably the useful) subset of characters.
Yes; so it may as well have just defined Char as an 8-bit ISO Latin-1
character.
Actually, US-ASCII (i.e. the same behaviour as ANSI C with the C/POSIX
locale) would arguably have been a better choice. At least that won't
fail quite so badly if you use e.g. toUpper on a string which is
actually in e.g. ISO Latin-2; the case may be wrong, but at least it
will be the correct letter.
--
Glynn Clements <glynn.clements at virgin.net>
More information about the Haskell
mailing list