Text in Haskell: A PROPOSAL
Ketil Z. Malde
ketil@ii.uib.no
08 Aug 2002 11:54:23 +0200
Ken Shan <ken@digitas.harvard.edu> writes:
> On the other hand, GHC uses Char to mean what files store and sockets
> transmit and foreign functions process under the C type "char".
Isn't "byte" or "octet" a better name for what files store and sockets
transmit?
> These two uses are inconsistent, and must be separated.
Right.
> I would be perfectly happy -- in fact, happier personally -- if Char
> were to mean "Unicode code point" and a new type CChar were created
> to mean "C char".
I think this is a more likely scenario. I'd use Word8, and leave
CChar for FFI purposes in case a "char" turns out to be different from
eight bits.
> Either way, the (function types in the) libraries must be cleaned up to
> maintain the distinction between "C char" and "Unicode code point".
> Furthermore, Haskell programs must be able to access both notions.
Would it be sufficient to have "raw" socket/file functions using
[Word8], and let the "standard" functions (e.g. readFile) convert to
[Char] according to current locale settings? With, perhaps, UTF-8 as
a reasonable default?
(And of course, en/decoding functions readily available for manual
use)
-kzm
--
If I haven't seen further, it is by standing in the footprints of giants