[Haskell-cafe] The Nature of Char and String

Ben Rudiak-Gould Benjamin.Rudiak-Gould at cl.cam.ac.uk
Sun Jan 30 14:39:59 EST 2005


John Goerzen wrote:

 >Char in Haskell represents a Unicode character.  I don't know exactly
 >what its size is, but it must be at least 16 bits and maybe more.
 >String would then share those properties.
 >
 >However, usually I'm accustomed to dealing with data in 8-bit words.
 >So I have some questions:

Char and String handling in Haskell is deeply broken. There's a 
discussion ongoing on this very list about fixing it (in the context of 
pathnames).

But for now, Haskell's Char behaves like C's char with respect to I/O. 
This is unlikely ever to change (in the existing I/O interface) because 
it would break too much code. So the answers to your questions are:

 > * If I use hPutStr on a string, is it guaranteed that the number of
 >   8-bit bytes written equals (length stringWritten)?

Yes, if the handle is opened in binary mode. No if not.

 >   + If yes, what happens to the upper 8 bits?  Are they simply
 >     stripped off?

Yes.

 > * If I run hGetChar, is it possible that it would consume more than
 >   one byte of input?

No in binary mode, yes in text mode.

 > * Does Haskell treat the "this is a Unicode file" marker special in
 >   any way?

No.

 > * Same questions on withCString and related String<->CString
 >   conversions.

They all behave as if reading/writing a file in binary mode.

-- Ben



More information about the Haskell-Cafe mailing list