[Haskell-cafe] How to get program command line arguments in Unicode-aware way (Unix/Linux)?

Austin Seipp mad.one at gmail.com
Wed Mar 11 23:04:28 EDT 2009


Excerpts from Dimitry Golubovsky's message of Wed Mar 11 21:42:14 -0500 2009:
> Hi,
> 
> I am trying to process command line arguments that may contain Unicode
> (cyrillic in this example) characters.
> 
> The standard GHC's getArgs seems to pass whatever was obtained from
> the underlying C library
> without any regard to encoding, e. g the following program (testarg.hs):
> 
> module Main where
> 
> import System.Environment
> 
> main = do
>   x <- getArgs
>   mapM (putStrLn . show) x
> 
> being invoked (ghc 6.10.1)
> 
> runghc testarg -T 'при<в>ет'
> 
> prints the following:
> 
> "-T"
> "\208\191\209\128\208\184<\208\178>\208\181\209\130"
> 
> (not correct, all bytes were passed without proper encoding)
> 
> Is there any way to get program arguments in GHC Unicode-aware? Or at
> least assuming that they are always in UTF-8?
> Something like System.IO.UTF8, but for command line arguments?
> 
> Thanks.
> 
> PS: BTW  runhugs testarg -T 'при<в>ет' prints:
> 
> "-T"
> "\1087\1088\1080<\1074>\1077\1090"
> 
> which is correct.
> 

Hello,

Would this approach work using utf8-string?

    import Codec.Binary.UTF8.String
    import System.Environment
    import Control.Monad
    
    main = do
        x <- liftM (map decodeString) getArgs
        mapM_ (putStrLn . encodeString) x

Austin


More information about the Haskell-Cafe mailing list