Why are strings linked lists?

Sat Nov 29 23:56:21 EST 2003

Ashley Yakeley wrote:

> > Simply claiming that values of type Char are Unicode characters
> > doesn't make it so.
> 
> Actually, that's exactly what makes it so.

Hmm. I suppose that there's some validity to that perspective. OTOH,
it's one thing to state that it's true, but that's rather hollow if
nothing actually behaves as if it is.

It's a bit like saying "values of type Int are complex numbers; oh,
BTW, the implementation is currently broken".

IOW, if it walks like a duck, ...

> > Unless I'm missing something, the only "support" that GHC provides is
> > that Char is 4 bytes.
> 
> No, on GHC a Char is a Unicode codepoint, which means it has only 
> 17*2^16 possible values. This by itself is the most important aspect of 
> Unicode support.

OK; by "Char is 4 bytes" I basically meant that it's "large enough".

> But most of the rest is missing.

AFAICT, *all*[1] of the rest is missing.

[1] With one rather useless exception: (maxBound :: Char) == 0x10ffff. 
I can't think of any other aspect of GHC's behaviour which would
indicate that Char is meant to be Unicode.

> > If you use Char to store anything other than ISO
> > Latin-1 characters, none of the Haskell functions with Char in their
> > signature will be of any use.
> 
> Actually, many of those functions ought to use Word8 instead.

But then:

1. Where would you get a Char from?
2. Where would you put it?

BTW, I agree that the IO functions *should* use Word8. And I really
wouldn't be that bothered if the standard was changed to just use
"type Char = Word8". Actually, I would prefer that to the current
fiction.

At least the problems with the Char functions are just implementation
bugs; those functions *could* be made to work correctly.

The IO problems are design bugs, and can't truly be fixed without
breaking a lot of existing code. A workaround which preserves backward
compatibility could result in a rather ugly interface: either all of
the relevant functions use a default encoding (which will probably be
the wrong one as often as not), or the "right" functions have to have
their names bastardised because the "wrong" functions have already
stolen the obvious names.

-- 
Glynn Clements <glynn.clements at virgin.net>