[xmonad] spawn functions are not unicode safe

Khudyakov Alexey alexey.skladnoy at gmail.com
Thu Jan 22 18:46:39 EST 2009


> Well put.  But it would be best to keep Xmonad encoding agnostic.
> Filenames are byte sequences, so spawn should take [Word8] instead of
> [Char].  Otherwise the best it can do is convert [Char] to [Word8] by
> truncating each Char.  Nothing guarrantees that every executable has
> an UTF-8 name, after all, even if the default system locale is UTF-8.
>
It's good thing to keep xmonad encoding agnostic. This is surely out of scope 
of window manager. Making spawn to accept [Word8] instead of String does not 
solve problem. It's just reformulate it and make more obvious. Converting 
[Char] to [Word8] is string encoding ;-). 

> It's better to investigate where the argument of spawn comes from, and
> handle the problem there.  Xlib knows about character encodings, maybe
> we could use its facilities and thus avoid adding further dependencies.

Arguments of spawn can come from anywhere. User input, hardcoded strings 
etc... This sources are far too numerous to handle encoding by themselves. 
As for xlib. AFAIK everything is OK with it. Real problem is standard 
library - putStrLn/executeFile/etc...

Only solution which allow to keep xmonad encoding agnostic and still allows to 
encode strings is to push choice of encoding to user. At least I could not 
imagine anything else. 

I've come up with two possible solutions. 

1. Add encoding field to XConfig. 
> data XConfig l = XConfig { 
>     -- skipped 
>     encoding :: String -> String  -- User supplied encoding
>     }
Downsides - requires to pass config to every function like spawn => change API 
=> break awfully lots of code.

2. Something along lines:
> -- IORef which stores encoding functions. It's value should be set
> -- from user's config and never modified.
> coding :: IO (IORef (String -> String))
> coding = newIORef id
> 
> -- Function to get encoder. Should be the only way to obtain it.
> getEncoder :: IO (String -> String)
> getEncoder = readIORef coding 
> 
> -- putStrLn which works with unicode string (sometimes)
> encodedPutStrLn :: String -> IO ()
> encodedPutStrLn str = do encode <- getEncoder
>                          putStrLn . encode $ str
Upside: break nothing. Only change few functions and provide way to set 
encoding. 
Downside: kludge. 


Comment, suggestions, ideas are welcome.

P.S. I found this in documentation for  XMonad.Util.XSelection. So not only 
spawn affected

    * Unicode handling is busted. But it's still better than calling
      'chr' to translate to ASCII, at least.
      As near as I can tell, the mangling happens when the String is
      outputted somewhere, such as via promptSelection's passing through
      the shell, or GHCi printing to the terminal. utf-string has IO functions
      which can fix this, though I do not know have to use them here. It's
      a complex issue; see
      http://www.haskell.org/pipermail/xmonad/2007-September/001967.html
      http://www.haskell.org/pipermail/xmonad/2007-September/001966.html.


More information about the xmonad mailing list