[Haskell] Re: ANNOUNCE: Haskeline 0.6.2 - unicode width calculation not working

Tue Sep 15 16:21:29 EDT 2009

Ahn, Ki Yung wrote:
> Judah Jacobson wrote:
>>
>> With this release, thanks to much appreciated feedback and suggestions 
>> from the
>> community, Haskeline's features become more competitive with its C
>> alternatives.  Improvements include:
>>
>>  * A multitude of new emacs and vi bindings:
>>      http://trac.haskell.org/haskeline/wiki/KeyBindings
>>  * A new preference 'historyDuplicates' to remove repeated history 
>> entries
>>  * Recognize PageUp and PageDown keys
>>  * Compatibility with ghc-6.12
>>  * Correct width calculations for Unicode combining characters
> 
> Oh, this Unicode width calculation sounds great!  This was the only 
> reason I needed readline and stay away from haskline.  If this works 
> properly, I no more nead to depend on readline for line input.  I think 
> this is a very good news for any mutibyte charset users.  I'll try to 
> replace readline in my memscript utility to haskline and come back to 
> report the results.  Thanks for your work on this!

Sorry for the bad news, but haskeline does not caculate correct width 
for Korean, and I don't believe it will for Chinese or Japanese either.

My linux LANG setting LANG=ko_KR.UTF-8, and I tested with the following 
example program in the Hackage Haddock documentation:

 > import System.Console.Haskeline
 >
 > main :: IO ()
 > main = runInputT defaultSettings loop
 >    where
 >       loop :: InputT IO ()
 >       loop = do
 >           minput <- getInputLine "% "
 >           case minput of
 >               Nothing -> return ()
 >               Just "quit" -> return ()
 >               Just input -> do outputStrLn $ "Input was: " ++ input
 >                                loop

When I typed in Korean characters and tried to erase them with backspace 
or del, it only deletes half of the characters on the screen although it 
seems that in the buffer the composed multibyte character was deleted. 
That is, when I type three Korean characters of my name, which is of 6 
ASCII alphabet character width, and press backspace key three times, the 
buffer is empty, but on my screen there is one and a half character 
remaining.

This same problem existed in the previous versions as well, and this is 
why multibyte charset users cannot adopt haskeline for their project, 
and have less satisfying experience using ghci to testing text input 
output actions.

In summary, I think haskeline is calculating the correct width but only 
applies that to the buffer content but not what is printed on the 
screen.  This gives the user very awkward experience when they try to 
move the cursor back to fix the line input.

> Ahn, Ki Yung