Text in Haskell: a second proposal

09 Aug 2002 10:19:55 +0200

On Fri, 2002-08-09 at 08:40, Ashley Yakeley wrote:
> At 2002-08-08 23:10, Ken Shan wrote:
> 
> > 1. Octets.
> > 2. C "char".
> > 3. Unicode code points.
> > 4. Unicode code values, useful only for UTF-16, which is seldom used.
> > 5. "What handles handle".
> ...
> >I suggest that the following Haskell types be used for the five items
> >above:
> >
> > 1. Word8
> > 2. CChar
> > 3. CodePoint
> > 4. Word16
> > 5. Char
> 
> I disagree, they should be:
> 
> 1. Word8
> 2. CChar
> 3. Char
> 4. Word16
> 5. Word8

Yes.

> >Let me elaborate.  Files are funny because the information units they
> >contain can be treated as both numbers and characters.
> 
> No, a file is always a list of octets. Nothing else (ignoring metadata, 
> forks etc.). Of course, you can interpret those octets as text using 
> "ASCII" or "UTF-8" or whatever, equally, you can interpret those octets 
> as an image using "PNG", "JPEG" etc. But those are secondary 
> transformations, separate from the business of reading from and writing 
> to a file.

Ack!

> We should have Word8-based interfaces to file and network handles. 
> Whether or not the old Char-based ones should be deprecated, or whatever, 
> I don't know.

I think any notion of treating the _raw_ contents of a file as Chars
must go, because it is simply incorrect. It's like a typo someone made,
because for a moment, he got Haskell Char and C char mixed up.

> As for Unicode codepoints, if there's to be an internationalisation 
> effort for Haskell, the type of character literals, Char, should be fixed 
> as the type for Unicode codepoints, much as it already is in GHC.

Ack.

Sven Moritz