Why are strings linked lists?

Ashley Yakeley ashley at semantic.org
Sat Nov 29 19:36:23 EST 2003


In article <16329.12837.324931.697803 at cerise.nosuchdomain.co.uk>,
 Glynn Clements <glynn.clements at virgin.net> wrote:

> OK; by "Char is 4 bytes" I basically meant that it's "large enough".

Char is exactly the correct size. The Eq, Ord and Enum instances all 
work correctly. The fact that you cannot represent values outside the 
range is important IMO.

> 1. Where would you get a Char from?
> 2. Where would you put it?

You can convert to and from the codepoint number using toEnum and 
fromEnum. What is missing is UTF-8 and Latin-1 charset conversions, and 
character properties. You can find draft standard library code for these 
here:
<http://sourceforge.net/projects/haskell-i18n/>

> BTW, I agree that the IO functions *should* use Word8.

Right.

> And I really
> wouldn't be that bothered if the standard was changed to just use
> "type Char = Word8". Actually, I would prefer that to the current
> fiction.

No! In GHC, a Char represents a Unicode codepoint: nothing more, and 
nothing less. This is something that probably ought to become part of 
some later Haskell standard. Frankly I find the idea that the character 
'A' is somehow identical to the number 65, or octet value 65, to be 
completely bizarre, and Haskell does well to give them separate types.

The problem is that certain IO functions do implicit Latin-1 conversion.

> The IO problems are design bugs, and can't truly be fixed without
> breaking a lot of existing code.

Well that's what deprecation is for. New Word8-based functions would 
have new names. Every so often there's a burst of activity on the 
Libraries or the Internationalisation lists concerning this, but it 
never quite comes together somehow.

-- 
Ashley Yakeley, Seattle WA



More information about the Haskell mailing list