[Haskell-cafe] The Nature of Char and String
Ben Rudiak-Gould
Benjamin.Rudiak-Gould at cl.cam.ac.uk
Sun Jan 30 14:39:59 EST 2005
John Goerzen wrote:
>Char in Haskell represents a Unicode character. I don't know exactly
>what its size is, but it must be at least 16 bits and maybe more.
>String would then share those properties.
>
>However, usually I'm accustomed to dealing with data in 8-bit words.
>So I have some questions:
Char and String handling in Haskell is deeply broken. There's a
discussion ongoing on this very list about fixing it (in the context of
pathnames).
But for now, Haskell's Char behaves like C's char with respect to I/O.
This is unlikely ever to change (in the existing I/O interface) because
it would break too much code. So the answers to your questions are:
> * If I use hPutStr on a string, is it guaranteed that the number of
> 8-bit bytes written equals (length stringWritten)?
Yes, if the handle is opened in binary mode. No if not.
> + If yes, what happens to the upper 8 bits? Are they simply
> stripped off?
Yes.
> * If I run hGetChar, is it possible that it would consume more than
> one byte of input?
No in binary mode, yes in text mode.
> * Does Haskell treat the "this is a Unicode file" marker special in
> any way?
No.
> * Same questions on withCString and related String<->CString
> conversions.
They all behave as if reading/writing a file in binary mode.
-- Ben
More information about the Haskell-Cafe
mailing list