[Haskell-cafe] The Nature of Char and String

John Goerzen jgoerzen at complete.org
Sun Jan 30 09:47:53 EST 2005


Char in Haskell represents a Unicode character.  I don't know exactly
what its size is, but it must be at least 16 bits and maybe more.
String would then share those properties.

However, usually I'm accustomed to dealing with data in 8-bit words.
So I have some questions:

 * If I use hPutStr on a string, is it guaranteed that the number of
   8-bit bytes written equals (length stringWritten)?

   + If no, what is the representation written?  I'm assuming UTF-8.
     How could I find out how many bytes were actually written?

   + If yes, what happens to the upper 8 bits?  Are they simply
     stripped off?

 * If I run hGetChar, is it possible that it would consume more than
   one byte of input?  How can I determine whether or not this has
   happend?

 * Does Haskell treat the "this is a Unicode file" marker special in
   any way?

 * Same questions on withCString and related String<->CString
   conversions.

-- John


More information about the Haskell-Cafe mailing list