Sending wide characters over the network socket

Glynn Clements glynn.clements@virgin.net
Sat, 5 Jul 2003 21:20:43 +0100


Dimitry Golubovsky wrote:

> I have tried to send a string of Unicode characters over a socket (or to 
> write it into a file handle). The result is strange: it looks like 
> characters are truncated down to their least significant bytes.

Yep.

> Honestly, I expected that 20 bytes were sent (or something smaller if 
> they were sent in UTF), and "Received" be identical to "Source was". The 
> last string of output is just to check whether those are indeed lower 
> bytes shown, not some garbage.
> 
> I am using a binary distribution of GHC 6.0 on Linux - are there any 
> special conditions I have to enable for the source distribution to be 
> able to send/receive Unicode characters?

No, it just isn't supported. All of the Haskell I/O functions take the
bottom octet and discard the top bits.

> To be more general: how would I send arbitrary binary data (stream of 
> octets) over a socket or a file handle? Should I always assume that only 
> lower bytes would be sent, and this will be forever in ghc?

Yes. Well, maybe not forever, but for the forseeable future.

> Or is it a bug?

No. It's just a fundamental design flaw in Haskell. Presumably someone
thought that wide-character support was just a question of defining
Char, and forgot about a minor issue called "I/O".

> The problem is, Handle/Socket functions require a String to be the type 
> of data to exchange; not a, say [Int8]. Therefore, I need to be able to 
> coerce my binary data buffer to a String.

Correct. IOW, lots of messing around with ord and chr and either
mod/div or the Bits library.

-- 
Glynn Clements <glynn.clements@virgin.net>