Sending wide characters over the network socket

Dimitry Golubovsky dimitry@golubovsky.org
Sat, 05 Jul 2003 15:32:17 -0400


Hi,


I have tried to send a string of Unicode characters over a socket (or to 
write it into a file handle). The result is strange: it looks like 
characters are truncated down to their least significant bytes.

Here is my program (I am new in Haskell, so the code may look not so 
good, but it illustrates what I am trying to achieve)

===========================================

module Main where

import IO
import Char
import Control.Concurrent
import Network.Socket

--      A string of wide characters

wide = [chr 0x1234, chr 0x5678, chr 0x4321, chr 0x8765, chr 0x102345]

--      Function to "narrow" characters to their least significant byte

narrow s = map (\c -> chr $ (ord c) `mod` 256) s

--      Decode string printing all of its characters' order numbers

strtodec str = foldr1 (++) ( map (\c -> " "++show(ord c)++" ") str)

--      Server: receive a string and print it decoded,
--      narrow it and print it again.
--      It does not actually loop, though it could

server=do
         sock <- socket AF_INET Stream 6
         ia <- inet_addr "127.0.0.1"
         bindSocket sock $ SockAddrInet 2323 ia
         listen sock 15
         srvloop sock where
                 srvloop sock = do
                         (acsock,from) <- accept sock
                         instr <- recv acsock 128
                         putStrLn $ "Received: "
                                 ++(strtodec instr)
                         putStrLn $ "Lower Bytes: "
                                 ++(strtodec $ narrow instr)
                         sClose acsock
                         sClose sock

--      Client: send a string of wide characters

client=do
         threadDelay 100
         sock <- socket AF_INET Stream 6
         ia <- inet_addr "127.0.0.1"
         connect sock $ SockAddrInet 2323 ia
         cnt <- send sock wide
         putStrLn $ "Sent "++(show cnt)++" bytes"
         putStrLn $ "Source was: "++(strtodec wide)
         sClose sock
         threadDelay 100

main=do
         forkIO (server)
         client


===========================================

And here is its output

===========================================

Sent 5 bytes
Source was:  4660  22136  17185  34661  1057605
Received:  52  120  33  101  69
Lower Bytes:  52  120  33  101  69

===========================================

Honestly, I expected that 20 bytes were sent (or something smaller if 
they were sent in UTF), and "Received" be identical to "Source was". The 
last string of output is just to check whether those are indeed lower 
bytes shown, not some garbage.

I am using a binary distribution of GHC 6.0 on Linux - are there any 
special conditions I have to enable for the source distribution to be 
able to send/receive Unicode characters?

To be more general: how would I send arbitrary binary data (stream of 
octets) over a socket or a file handle? Should I always assume that only 
lower bytes would be sent, and this will be forever in ghc? Or is it a bug?

The problem is, Handle/Socket functions require a String to be the type 
of data to exchange; not a, say [Int8]. Therefore, I need to be able to 
coerce my binary data buffer to a String.

PS Of course, I could write my own socket functions, but I am looking 
for a more or less "pure" Haskell solution.

PPS I tried this only with GHC.

-- 
Dmitry M. Golubovsky
       South Lyon, MI