Sending wide characters over the network socket
Dimitry Golubovsky
dimitry@golubovsky.org
Sat, 05 Jul 2003 15:32:17 -0400
Hi,
I have tried to send a string of Unicode characters over a socket (or to
write it into a file handle). The result is strange: it looks like
characters are truncated down to their least significant bytes.
Here is my program (I am new in Haskell, so the code may look not so
good, but it illustrates what I am trying to achieve)
===========================================
module Main where
import IO
import Char
import Control.Concurrent
import Network.Socket
-- A string of wide characters
wide = [chr 0x1234, chr 0x5678, chr 0x4321, chr 0x8765, chr 0x102345]
-- Function to "narrow" characters to their least significant byte
narrow s = map (\c -> chr $ (ord c) `mod` 256) s
-- Decode string printing all of its characters' order numbers
strtodec str = foldr1 (++) ( map (\c -> " "++show(ord c)++" ") str)
-- Server: receive a string and print it decoded,
-- narrow it and print it again.
-- It does not actually loop, though it could
server=do
sock <- socket AF_INET Stream 6
ia <- inet_addr "127.0.0.1"
bindSocket sock $ SockAddrInet 2323 ia
listen sock 15
srvloop sock where
srvloop sock = do
(acsock,from) <- accept sock
instr <- recv acsock 128
putStrLn $ "Received: "
++(strtodec instr)
putStrLn $ "Lower Bytes: "
++(strtodec $ narrow instr)
sClose acsock
sClose sock
-- Client: send a string of wide characters
client=do
threadDelay 100
sock <- socket AF_INET Stream 6
ia <- inet_addr "127.0.0.1"
connect sock $ SockAddrInet 2323 ia
cnt <- send sock wide
putStrLn $ "Sent "++(show cnt)++" bytes"
putStrLn $ "Source was: "++(strtodec wide)
sClose sock
threadDelay 100
main=do
forkIO (server)
client
===========================================
And here is its output
===========================================
Sent 5 bytes
Source was: 4660 22136 17185 34661 1057605
Received: 52 120 33 101 69
Lower Bytes: 52 120 33 101 69
===========================================
Honestly, I expected that 20 bytes were sent (or something smaller if
they were sent in UTF), and "Received" be identical to "Source was". The
last string of output is just to check whether those are indeed lower
bytes shown, not some garbage.
I am using a binary distribution of GHC 6.0 on Linux - are there any
special conditions I have to enable for the source distribution to be
able to send/receive Unicode characters?
To be more general: how would I send arbitrary binary data (stream of
octets) over a socket or a file handle? Should I always assume that only
lower bytes would be sent, and this will be forever in ghc? Or is it a bug?
The problem is, Handle/Socket functions require a String to be the type
of data to exchange; not a, say [Int8]. Therefore, I need to be able to
coerce my binary data buffer to a String.
PS Of course, I could write my own socket functions, but I am looking
for a more or less "pure" Haskell solution.
PPS I tried this only with GHC.
--
Dmitry M. Golubovsky
South Lyon, MI