[Haskell-cafe] UTF-8 in Haskell.
uzytkownik2 at gmail.com
Thu Dec 23 08:56:53 CET 2010
On Thu, 2010-12-23 at 14:15 +0800, Magicloud Magiclouds wrote:
> On Thu, Dec 23, 2010 at 2:01 PM, Mark Lentczner <markl at glyphic.com> wrote:
> > On Dec 22, 2010, at 9:29 PM, Magicloud Magiclouds wrote:
> >> Thus under all situation (ascii, UTF-8, or even
> >> UTF-32), my program always send 4 bytes through the network. Is that
> >> OK?
> > Generally, no.
> > Haskell strings are sequences of Unicode characters. Each character has an integral code point value, from 0 to 0x10ffff, but technically, the code point itself is just a number, not a pattern of bits to be exchanged. That is an encoding.
> > In any protocol you need know the encoding before you exchange characters as bytes or words. In some protocols it is implicit, in others explicit in header or meta data, and in yet others (IRC comes to mind) it is undefined (which makes problems for the user).
> > The UTF-8 encoding uses a variable number of bytes to represent each character, depending on the code point, not Word32 as you suggested.
> > Converting from Haskell's String to various encodings can be done with either the "text" package or "utf8-string" package.
> > - Mark
> I see. I just realize that, in this case (ssh), I could use CString to
> avoid all problems about encoding.
By using CString you may avoid problems by putting them on users.
CString is char * and Foreign marshaling just use ASCII. And as non only
English speaking user of computer programs I ask to have support of
unicode (for example utf-8). Unless you mean only commands, not data, in
which you probably should check details of protocol.
In any case I don't think that CString is correct approach to network
data and you probably should use ByteString in place of CString.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 836 bytes
Desc: This is a digitally signed message part
More information about the Haskell-Cafe