Sending wide characters over the network socket
Glynn Clements
glynn.clements@virgin.net
Sat, 5 Jul 2003 21:20:43 +0100
Dimitry Golubovsky wrote:
> I have tried to send a string of Unicode characters over a socket (or to
> write it into a file handle). The result is strange: it looks like
> characters are truncated down to their least significant bytes.
Yep.
> Honestly, I expected that 20 bytes were sent (or something smaller if
> they were sent in UTF), and "Received" be identical to "Source was". The
> last string of output is just to check whether those are indeed lower
> bytes shown, not some garbage.
>
> I am using a binary distribution of GHC 6.0 on Linux - are there any
> special conditions I have to enable for the source distribution to be
> able to send/receive Unicode characters?
No, it just isn't supported. All of the Haskell I/O functions take the
bottom octet and discard the top bits.
> To be more general: how would I send arbitrary binary data (stream of
> octets) over a socket or a file handle? Should I always assume that only
> lower bytes would be sent, and this will be forever in ghc?
Yes. Well, maybe not forever, but for the forseeable future.
> Or is it a bug?
No. It's just a fundamental design flaw in Haskell. Presumably someone
thought that wide-character support was just a question of defining
Char, and forgot about a minor issue called "I/O".
> The problem is, Handle/Socket functions require a String to be the type
> of data to exchange; not a, say [Int8]. Therefore, I need to be able to
> coerce my binary data buffer to a String.
Correct. IOW, lots of messing around with ord and chr and either
mod/div or the Bits library.
--
Glynn Clements <glynn.clements@virgin.net>