[Haskell-cafe] FFI binding -- different behaviour under compilation
and interpretation.
Jason Dusek
jason.dusek at gmail.com
Fri Nov 13 18:00:36 EST 2009
I'm binding to `wcwidth` to determine the column widths of
various Unicode characters. I noticed a lot of -- in fact all
-- Chinese characters were being given widths of `-1` when of
course they should have width `2`. This only showed up when I
compiled my program though -- within GHCi, it never happened.
Below my signature is a parred down example that demoes the
bug. It tries to get the width of only one Chinese character.
You can see it like this:
:; ghc --make DemoFailure.hs -o demo && demo
[1 of 1] Compiling Main ( DemoFailure.hs, DemoFailure.o )
Linking demo ...
0x00005cff -1 峿
:; chmod ug+x DemoFailure.hs && DemoFailure.hs
0x00005cff 2 峿
Switching between safe/unsafe does not make any difference. This
was run on a Macintosh.
--
Jason Dusek
#!/usr/bin/env runhaskell
{- DemoFailure.hs -}
{-# LANGUAGE ForeignFunctionInterface
#-}
import Foreign.C
import Data.Char
import Text.Printf
import qualified System.IO.UTF8 as UTF8
main = do
(sequence_ . fmap (UTF8.putStrLn . uncurry fmt)) widths
where
widths = [ (c, wcwidth c) | c <- ['\x5cff'] ]
--widths = [ (c, wcwidth c) | c <- [minBound..maxBound] ]
fmt c cols = printf "0x%08x %2d %s" (fromEnum c) cols rep
where
rep | ' ' == c = "\\SP"
| isSpace c = '\\' : show (fromEnum c)
| isPrint c = [c]
| otherwise = (reverse . drop 1 . reverse . drop 1 . show) c
wcwidth :: Char -> Int
wcwidth = fromEnum . native . toEnum . fromEnum
foreign import ccall unsafe "wchar.h wcwidth" native :: CWchar -> CInt
More information about the Haskell-Cafe
mailing list