[Haskell-cafe] getting crazy with character encoding
mailing_list at istitutocolli.org
Wed Sep 12 10:18:43 EDT 2007
supposed that, in a Linux system, in an utf-8 locale, you create a file
with non ascii characters. For instance:
Now, I would expect that the output of a shell command such as
would be a string/list of 5 chars. Instead I find it to be a list of 8
That is to say, each non ascii character is read as 2 characters, as
if the string were an ISO-8859-1 string - the string is actually
treated as an ISO-8859-1 string. But when I print it, now it is
I don't understand what's wrong and, this is worse, I don't understand
what I should be studying to understand what I'm doing wrong.
After reading about character encoding, the way the linux kernel
manages file names, I would expect that a file name set in an utf-8
locale should be read by locale aware application as an utf-8 string,
and each character a unicode code point which can be represented by a
Haskell char. What's wrong with that?
Thanks for your kind attention.
Here the code to test my problem. Before creating the file remember to
set the LANG environmental variable. Something like:
should be fine. (Check your available locales with "locale -a")
main = do
l <- fmap lines $ runProcessWithInput "/bin/bash"  "ls ab*"
putStrLn (show l)
mapM_ putStrLn l
mapM_ (putStrLn . show . length) l
runProcessWithInput cmd args input = do
(pin, pout, perr, ph) <- runInteractiveProcess cmd args Nothing Nothing
hPutStr pin input
output <- hGetContents pout
when (output==output) $ return ()
More information about the Haskell-Cafe