[Xmonad] XSelection.hs - Unicode problems

Gwern Branwen gwern0 at gmail.com
Sat Aug 25 05:43:53 EDT 2007


So I was working on splitting out Andrea's hxsel into something useful for one's Config.hs, and I'm basically done, and it seems to work well.

However, while it works fine for your basic bread and butter ASCII characters, I noticed that it does terrible things to more exotic phrases involving Unicode, such as "Henri Poincaré". I borrowed some code from utf-string, and that improved it a little bit - "Henri Poincaré" now becomes "Henri Poincarý" which is still better than "Henri Poincar\245" or whatever.

As far as I can tell, whenever a String involving UTF-8 stuff leaves the Haskell environment, it gets messed up. This is a little hard to test since things get messed up even when I test them in GHCi. :) But I'm sure it has to be something to do with Haskell, since I know I can copy and paste such strings with no problems using the mouse, and I know that my shell isn't the problem and nor are the Surfraw programs I use to pass them to Firefox (and Firefox obviously has no problems handling those characters).

utf-string does have a special IO module which can print out and receive UTF-8 strings, but it's limited to things like 'putStr' and 'getContents'. There doesn't seem to be anything that would be useful for fixing 'spawn', which seems to be the specific function that needs fixing, since it isn't using any of them and executeFile doesn't seem to use any of them either.
 spawn x = io $ do
     pid <- forkProcess $ do
         forkProcess (createSession >> executeFile "/bin/sh" False ["-c", x] Nothing)
         exitWith ExitSuccess
     getProcessStatus True False pid
     return ()

Any ideas on how to fix this?

-----------------------------------------------------------------------------
-- |
-- Module      :  XMonadContrib.XSelection
-- Copyright   :  (C) 2007 Andrea Rossato
-- License     :  BSD3
--
-- Maintainer  :  andrea.rossato at unibz.it
-- Stability   :  unstable
-- Portability :  unportable
--
-- A module for accessing and manipulating the X Window mouse selection (used in copy and pasting).
--
-----------------------------------------------------------------------------
{- $usage
    Add 'import XMonadContrib.XSelection' to the top of Config.hs
    Then make use of getSelection or promptSelection as needed.

    TODO:
             * Fix Unicode handling. Currently it's still better than calling 'chr' to translate to ASCII, though.
               As near as I can tell, the mangling happens when the String is outputted somewhere, such as via
               promptSelection's passing through the shell, or GHCi printing to the terminal. utf-string has IO functions
               which can fix this, though I do not know have to use them here.
             * Add a 'putSelection' to allow modification of the selection.
             * Possibly add some more elaborate functionality: Emacs' registers are nice.
-}

module XMonadContrib.XSelection (getSelection, promptSelection) where
-- getSelection's imports:
import Graphics.X11.Xlib (allocaXEvent, createSimpleWindow, defaultScreen, destroyWindow, internAtom, nextEvent, openDisplay, rootWindow, selectionNotify, sTRING)
import Graphics.X11.Xlib.Extras (currentTime, ev_event_type, getEvent, getWindowProperty8, xConvertSelection)
import Data.Maybe (fromMaybe)
import Data.Char (chr)
-- promptSelection's imports:
import XMonad (io, spawn, X ())
-- decode's imports
import Foreign -- (Word8(), (.&.), shiftL, (.|.))

-- | Returns a String corresponding to the current mouse selection in X; if there is none, an empty string is returned.. Note that this is
-- only reliable for ASCII text and currently mangles/escapes more complex UTF-8 characters.
getSelection :: IO String
getSelection = do
  dpy <- openDisplay ""
  let dflt = defaultScreen dpy
  rootw  <- rootWindow dpy dflt
  win <- createSimpleWindow dpy rootw 0 0 200 100 0 0 0
  p <- internAtom dpy "PRIMARY" True
  clp <- internAtom dpy "BLITZ_SEL_STRING" False
  xConvertSelection dpy p sTRING clp win currentTime
  allocaXEvent $ \e -> do
    nextEvent dpy e
    ev <- getEvent e
    if ev_event_type ev == selectionNotify
       then do res <- getWindowProperty8 dpy clp win
               return $ decode  . fromMaybe [] $ res
       else destroyWindow dpy win >> return ""

-- | A wrapper around getSelection. Makes it convenient to run a program with the current selection as an argument. This is convenient
-- for handling URLs, in particular. For example, in your Config.hs you could bind a key to 'promptSelection "firefox"'; this would allow you to
-- highlight a URL string and then immediately open it up in Firefox.
promptSelection :: String -> X ()
promptSelection app = spawn . ((app ++ " ") ++) =<< io getSelection

{- UTF-8 decoding for internal use in getSelection. This code is totally stolen from Eric Mertens's utf-string library
   <http://code.haskell.org/utf8-string/> (version 0.1), which fortunately is BSD-3 licensed, so I can just copy it into this BSD-3 licensed module.
   I guess it'd be better to just import Codec.Binary.UTF8.String (decode), but then users of this would need to install it, and Xmonad has enough
   dependencies as it is. -}
decode :: [Word8] -> String
decode [    ] = ""
decode (c:cs)
  | c < 0x80  = chr (fromEnum c) : decode cs
  | c < 0xc0  = replacement_character : decode cs
  | c < 0xe0  = multi_byte 1 0x1f 0x80
  | c < 0xf0  = multi_byte 2 0xf  0x800
  | c < 0xf8  = multi_byte 3 0x7  0x10000
  | c < 0xfc  = multi_byte 4 0x3  0x200000
  | c < 0xfe  = multi_byte 5 0x1  0x4000000
  | otherwise = replacement_character : decode cs
  where
    replacement_character :: Char
    replacement_character = '\xfffd'

    multi_byte :: Int -> Word8 -> Int -> [Char]
    multi_byte i mask overlong = aux i cs (fromEnum (c .&. mask))
      where
        aux 0 rs acc
          | overlong <= acc && acc <= 0x10ffff &&
            (acc < 0xd800 || 0xdfff < acc)     &&
            (acc < 0xfffe || 0xffff < acc)      = chr acc : decode rs
          | otherwise = replacement_character : decode rs

        aux n (r:rs) acc
          | r .&. 0xc0 == 0x80 = aux (n-1) rs
                               $ shiftL acc 6 .|. fromEnum (r .&. 0x3f)

        aux _ rs     _ = replacement_character : decode rs

--
gwern
DDR&E E911 BCCI State EO PRIME iButton WID OAU Flintlock
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/xmonad/attachments/20070825/616a0192/attachment.bin


More information about the Xmonad mailing list