Text in Haskell: A PROPOSAL

Joe English jenglish@flightlab.com
Wed, 07 Aug 2002 16:34:42 -0700


Ashley Yakeley wrote:

> At 2002-08-07 11:05, Ken Shan wrote:
>
> >Let me clarify my understanding of this point a bit further.  On the one
> >hand, GHC uses Char to mean a 32-bit value like a Unicode code point.
>
> No, GHC uses Char to mean a Unicode codepoint. These are not 32-bit. It
> only allows the 17 pages i.e. values in the range '\x0' to '\x10FFFF'.
> This is the Right Thing as per Unicode 3.1 and later (current is 3.2.0).
>
> >On the other hand, GHC uses Char to mean what files store and sockets
> >transmit and foreign functions process under the C type "char".
>
> Right, and this is a very bad idea. The file IO functions should be using
> Word8s

It's often very useful to treat a file as a sequence
of characters; in fact I'd say that's probably more
common than treating them as a sequence of octets.
But both are clearly needed.

In my opinion, hPutChar :: Handle -> Char -> IO () should
do what its name and type indicate -- write a character
to the specified output handle.  The I/O subsystem
should take care of translation to UTF-8 (or whatever
the system encoding is).

hPutWord8 :: Handle -> Word8 -> IO () should be available
_in addition to_ hPutChar, for applications that need
to treat files as a sequence of octets.


> >These two uses are inconsistent, and must be separated.
> I agree.

Me too; but both character-based and octet-based operations
are needed.


--Joe English

  jenglish@flightlab.com