Text in Haskell: a second proposal
Sven Moritz Hallberg
pesco@gmx.de
09 Aug 2002 10:19:55 +0200
On Fri, 2002-08-09 at 08:40, Ashley Yakeley wrote:
> At 2002-08-08 23:10, Ken Shan wrote:
>
> > 1. Octets.
> > 2. C "char".
> > 3. Unicode code points.
> > 4. Unicode code values, useful only for UTF-16, which is seldom used.
> > 5. "What handles handle".
> ...
> >I suggest that the following Haskell types be used for the five items
> >above:
> >
> > 1. Word8
> > 2. CChar
> > 3. CodePoint
> > 4. Word16
> > 5. Char
>
> I disagree, they should be:
>
> 1. Word8
> 2. CChar
> 3. Char
> 4. Word16
> 5. Word8
Yes.
> >Let me elaborate. Files are funny because the information units they
> >contain can be treated as both numbers and characters.
>
> No, a file is always a list of octets. Nothing else (ignoring metadata,
> forks etc.). Of course, you can interpret those octets as text using
> "ASCII" or "UTF-8" or whatever, equally, you can interpret those octets
> as an image using "PNG", "JPEG" etc. But those are secondary
> transformations, separate from the business of reading from and writing
> to a file.
Ack!
> We should have Word8-based interfaces to file and network handles.
> Whether or not the old Char-based ones should be deprecated, or whatever,
> I don't know.
I think any notion of treating the _raw_ contents of a file as Chars
must go, because it is simply incorrect. It's like a typo someone made,
because for a moment, he got Haskell Char and C char mixed up.
> As for Unicode codepoints, if there's to be an internationalisation
> effort for Haskell, the type of character literals, Char, should be fixed
> as the type for Unicode codepoints, much as it already is in GHC.
Ack.
Sven Moritz