[Haskell-cafe] How to get program command line arguments in Unicode-aware way (Unix/Linux)?

Bulat Ziganshin bulat.ziganshin at gmail.com
Thu Mar 12 05:57:47 EDT 2009


Hello Dimitry,

Thursday, March 12, 2009, 5:42:14 AM, you wrote:

depends on your OS. for windows i use this code:

myGetArgs = do
   alloca $ \p_argc -> do
   p_argv_w <- commandLineToArgvW getCommandLineW p_argc
   argc     <- peek p_argc
   argv_w   <- peekArray (i argc) p_argv_w
   mapM peekTString argv_w >>== tail

foreign import stdcall unsafe "windows.h GetCommandLineW"
  getCommandLineW :: LPTSTR

foreign import stdcall unsafe "windows.h CommandLineToArgvW"
  commandLineToArgvW :: LPCWSTR -> Ptr CInt -> IO (Ptr LPWSTR)


note that it doesn't skip over +RTS sections. btw, i plan to make
unicode-aware version of System.Directory module to solve all these problems

> I am trying to process command line arguments that may contain Unicode
> (cyrillic in this example) characters.

> The standard GHC's getArgs seems to pass whatever was obtained from
> the underlying C library
> without any regard to encoding, e. g the following program (testarg.hs):

> module Main where

> import System.Environment

> main = do
>   x <- getArgs
>   mapM (putStrLn . show) x

> being invoked (ghc 6.10.1)

> runghc testarg -T 'при<в>ет'

> prints the following:

> "-T"
> "\208\191\209\128\208\184<\208\178>\208\181\209\130"

> (not correct, all bytes were passed without proper encoding)

> Is there any way to get program arguments in GHC Unicode-aware? Or at
> least assuming that they are always in UTF-8?
> Something like System.IO.UTF8, but for command line arguments?

> Thanks.

> PS: BTW  runhugs testarg -T 'при<в>ет' prints:

> "-T"
> "\1087\1088\1080<\1074>\1077\1090"

> which is correct.




-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com



More information about the Haskell-Cafe mailing list