[Haskell-cafe] getting crazy with character encoding
Andrea Rossato
mailing_list at istitutocolli.org
Wed Sep 12 10:18:43 EDT 2007
Hi,
supposed that, in a Linux system, in an utf-8 locale, you create a file
with non ascii characters. For instance:
touch abèèè
Now, I would expect that the output of a shell command such as
"ls ab*"
would be a string/list of 5 chars. Instead I find it to be a list of 8
chars...;-)
That is to say, each non ascii character is read as 2 characters, as
if the string were an ISO-8859-1 string - the string is actually
treated as an ISO-8859-1 string. But when I print it, now it is
displayed correctly.
I don't understand what's wrong and, this is worse, I don't understand
what I should be studying to understand what I'm doing wrong.
After reading about character encoding, the way the linux kernel
manages file names, I would expect that a file name set in an utf-8
locale should be read by locale aware application as an utf-8 string,
and each character a unicode code point which can be represented by a
Haskell char. What's wrong with that?
Thanks for your kind attention.
Andrea
Here the code to test my problem. Before creating the file remember to
set the LANG environmental variable. Something like:
export LANG="en_US.utf8"
should be fine. (Check your available locales with "locale -a")
import System.Process
import System.IO
import Control.Monad
main = do
l <- fmap lines $ runProcessWithInput "/bin/bash" [] "ls ab*"
putStrLn (show l)
mapM_ putStrLn l
mapM_ (putStrLn . show . length) l
runProcessWithInput cmd args input = do
(pin, pout, perr, ph) <- runInteractiveProcess cmd args Nothing Nothing
hPutStr pin input
hClose pin
output <- hGetContents pout
when (output==output) $ return ()
hClose pout
hClose perr
waitForProcess ph
return output
More information about the Haskell-Cafe
mailing list