Text in Haskell: A PROPOSAL
Joe English
jenglish@flightlab.com
Wed, 07 Aug 2002 16:34:42 -0700
Ashley Yakeley wrote:
> At 2002-08-07 11:05, Ken Shan wrote:
>
> >Let me clarify my understanding of this point a bit further. On the one
> >hand, GHC uses Char to mean a 32-bit value like a Unicode code point.
>
> No, GHC uses Char to mean a Unicode codepoint. These are not 32-bit. It
> only allows the 17 pages i.e. values in the range '\x0' to '\x10FFFF'.
> This is the Right Thing as per Unicode 3.1 and later (current is 3.2.0).
>
> >On the other hand, GHC uses Char to mean what files store and sockets
> >transmit and foreign functions process under the C type "char".
>
> Right, and this is a very bad idea. The file IO functions should be using
> Word8s
It's often very useful to treat a file as a sequence
of characters; in fact I'd say that's probably more
common than treating them as a sequence of octets.
But both are clearly needed.
In my opinion, hPutChar :: Handle -> Char -> IO () should
do what its name and type indicate -- write a character
to the specified output handle. The I/O subsystem
should take care of translation to UTF-8 (or whatever
the system encoding is).
hPutWord8 :: Handle -> Word8 -> IO () should be available
_in addition to_ hPutChar, for applications that need
to treat files as a sequence of octets.
> >These two uses are inconsistent, and must be separated.
> I agree.
Me too; but both character-based and octet-based operations
are needed.
--Joe English
jenglish@flightlab.com