[Haskell-cafe] Unicode strings and runCommand / runProcess
Khudyakov Alexey
alexey.skladnoy at gmail.com
Sat Apr 24 07:05:52 EDT 2010
В сообщении от 24 апреля 2010 06:14:55 вы написали:
> Khudyakov Alexey wrote:
> >> Actually, the behavior of openFile when given a String with characters >
> >> 0xFF is also completely undocumented. I am not sure what it does with
> >> that. It should probably be the same as runCommand, whatever it is.
> >
> > Under unices file names are just array of bytes. There is no notion of
> > encoding at all. It's just matter of interpretation of that array.
>
> Quite right. One must be able to pass binary strings, which contain
> anything except \0 and '/' to openFile. The same goes for runCommand.
> I am uncomfortable, for this reason, with saying that runCommand ought
> to re-encode in the system locale while openFile doesn't. It is
> preferable to drop characters than to drop the ability to pass arbitrary
> binary data.
>
But truncation makes impossible to pass non ASCII strings portably. They
should be encoded there is no easy way to do so.
Actually problem is use of strings. String is sequence of _characters_ and
program talk to outside world using sequence of bytes. I think that right (but
impossible) way to solve this problem is to use separate data types for file
path, command line arguments.
Something along the lines:
> data FilePath = ...
>
> stringToFilePath :: String -> Maybe FilePath
> filePathToString :: FilePath -> Maybe String
Both functions are non total hence presence of Maybes. But it break a LOT of
code and violate language definition.
I think there are two alternatives. One is to encode/decode strings using
current locale and provide [Word8] based variants. Main problem is that
seeming innocent actions like getting directory content could crash program
(exception )
Another options is to provide function to encode/decode strings. This is ugly
and mix strings which hold characters and string which hold bytes and
completely unhaskellish but it seems there is no good solution.
Also truncation could have security implications. It makes almost impossible
to escape dangerous characters robustly. Consider following code. This is more
matter of speculations than real threat but nevertheless:
> evil, maskedEvil :: String
> evil = "I am an evil script; date; echo I\\'m doing whatever I want"
> maskedEvil = map (toEnum . (+256) . fromEnum) evil
>
> -- Should escape all dangerous chars
> escape :: String -> String
> escape = id
>
> oops :: IO ()
> oops = do
> runCommand ("echo " ++ maskedEvil ++ "")
> return ()
More information about the Haskell-Cafe
mailing list