Text in Haskell: a second proposal
Wolfgang Jeltsch
wolfgang@jeltsch.net
10 Aug 2002 13:34:39 +0200
On Friday, 2002-08-09, 08:40, CEST, Ashley Yakeley wrote:
> At 2002-08-08 23:10, Ken Shan wrote:
>
> > 1. Octets.
> > 2. C "char".
> > 3. Unicode code points.
> > 4. Unicode code values, useful only for UTF-16, which is seldom used.
> > 5. "What handles handle".
> ...
> >I suggest that the following Haskell types be used for the five items
> >above:
> >
> > 1. Word8
> > 2. CChar
> > 3. CodePoint
> > 4. Word16
> > 5. Char
>
> I disagree, they should be:
>
> 1. Word8
> 2. CChar
> 3. Char
> 4. Word16
> 5. Word8
>
> >Let me elaborate. Files are funny because the information units they
> >contain can be treated as both numbers and characters.
>
> No, a file is always a list of octets. Nothing else (ignoring metadata,
> forks etc.). Of course, you can interpret those octets as text using
> "ASCII" or "UTF-8" or whatever, equally, you can interpret those octets
> as an image using "PNG", "JPEG" etc. But those are secondary
> transformations, separate from the business of reading from and writing
> to a file.
>
> We should have Word8-based interfaces to file and network handles.
> Whether or not the old Char-based ones should be deprecated, or whatever,
> I don't know.
>
> As for Unicode codepoints, if there's to be an internationalisation
> effort for Haskell, the type of character literals, Char, should be fixed
> as the type for Unicode codepoints, much as it already is in GHC.
>
> --
> Ashley Yakeley, Seattle WA
Some remarks:
* A file doesn't have to be a list of octets. On the other hand,
the assumption of files consisting of octets makes sense for
most platforms. Therefore, I think, using Word8 for
file/stream elements is a good solution.
* Maybe traditional character-based I/O operations should use
the default locale. This way, they could be very useful for
reading from and writing to terminals. For file access I would
discourage the use of them and propagate the combination of
octet based I/O and encoding functions/decoding parsers.
Apart from these two points I fully agree with Ashley.
Wolfgang