[Haskell-cafe] Unicode strings and runCommand / runProcess

Khudyakov Alexey alexey.skladnoy at gmail.com
Sat Apr 24 07:05:52 EDT 2010

В сообщении от 24 апреля 2010 06:14:55 вы написали:
> Khudyakov Alexey wrote:
> >> Actually, the behavior of openFile when given a String with characters >
> >> 0xFF is also completely undocumented.  I am not sure what it does with
> >> that.  It should probably be the same as runCommand, whatever it is.
> >
> > Under unices file names are just array of bytes. There is no notion of
> > encoding at all. It's just matter of interpretation of that array.
> Quite right.  One must be able to pass binary strings, which contain
> anything except \0 and '/' to openFile.  The same goes for runCommand.
> I am uncomfortable, for this reason, with saying that runCommand ought
> to re-encode in the system locale while openFile doesn't.  It is
> preferable to drop characters than to drop the ability to pass arbitrary
> binary data.
But truncation makes impossible to pass non ASCII strings portably. They 
should be encoded there is no easy way to do so. 

Actually problem is use of strings. String is sequence of _characters_ and 
program talk to outside world using sequence of bytes. I think that right (but 
impossible) way to solve this problem is to use separate data types for file 
path, command line arguments.

Something along the lines:
> data FilePath = ...
> stringToFilePath :: String -> Maybe FilePath
> filePathToString :: FilePath -> Maybe String

Both functions are non total hence presence of Maybes. But it break a LOT of 
code and violate language definition.

I think there are two alternatives. One is to encode/decode strings using 
current locale and provide [Word8] based variants. Main problem is that 
seeming innocent actions like getting directory content could crash program 
(exception )

Another options is to provide function to encode/decode strings. This is ugly 
and mix strings which hold characters and string which hold bytes and 
completely unhaskellish but it seems there is no good solution.

Also truncation could have security implications. It makes almost impossible 
to escape dangerous characters robustly. Consider following code. This is more 
matter of speculations than real threat but nevertheless:

> evil, maskedEvil :: String
> evil = "I am an evil script; date; echo I\\'m doing whatever I want"
> maskedEvil = map (toEnum . (+256) . fromEnum) evil
> -- Should escape all dangerous chars
> escape :: String -> String
> escape = id
> oops :: IO ()
> oops = do
>   runCommand ("echo " ++ maskedEvil ++ "")
>   return ()

More information about the Haskell-Cafe mailing list