Text in Haskell: A PROPOSAL

08 Aug 2002 11:54:23 +0200

Ken Shan <ken@digitas.harvard.edu> writes:

> On the other hand, GHC uses Char to mean what files store and sockets
> transmit and foreign functions process under the C type "char".  

Isn't "byte" or "octet" a better name for what files store and sockets
transmit?

> These two uses are inconsistent, and must be separated.

Right.

> I would be perfectly happy -- in fact, happier personally -- if Char
> were to mean "Unicode code point" and a new type CChar were created
> to mean "C char".

I think this is a more likely scenario.  I'd use Word8, and leave
CChar for FFI purposes in case a "char" turns out to be different from
eight bits.

> Either way, the (function types in the) libraries must be cleaned up to
> maintain the distinction between "C char" and "Unicode code point".
> Furthermore, Haskell programs must be able to access both notions.

Would it be sufficient to have "raw" socket/file functions using
[Word8], and let the "standard" functions (e.g. readFile) convert to
[Char] according to current locale settings?  With, perhaps, UTF-8 as
a reasonable default?

(And of course, en/decoding functions readily available for manual
use) 

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants