Text in Haskell: a second proposal

Ashley Yakeley ashley@semantic.org
Thu, 8 Aug 2002 23:40:42 -0700


At 2002-08-08 23:10, Ken Shan wrote:

> 1. Octets.
> 2. C "char".
> 3. Unicode code points.
> 4. Unicode code values, useful only for UTF-16, which is seldom used.
> 5. "What handles handle".
...
>I suggest that the following Haskell types be used for the five items
>above:
>
> 1. Word8
> 2. CChar
> 3. CodePoint
> 4. Word16
> 5. Char

I disagree, they should be:

1. Word8
2. CChar
3. Char
4. Word16
5. Word8

>Let me elaborate.  Files are funny because the information units they
>contain can be treated as both numbers and characters.

No, a file is always a list of octets. Nothing else (ignoring metadata, 
forks etc.). Of course, you can interpret those octets as text using 
"ASCII" or "UTF-8" or whatever, equally, you can interpret those octets 
as an image using "PNG", "JPEG" etc. But those are secondary 
transformations, separate from the business of reading from and writing 
to a file.

We should have Word8-based interfaces to file and network handles. 
Whether or not the old Char-based ones should be deprecated, or whatever, 
I don't know.

As for Unicode codepoints, if there's to be an internationalisation 
effort for Haskell, the type of character literals, Char, should be fixed 
as the type for Unicode codepoints, much as it already is in GHC.

-- 
Ashley Yakeley, Seattle WA