[Haskell-cafe] Fwd: How to input Unicode string in Haskell program?

Semyon Kholodnov joker.vd at gmail.com
Thu Feb 21 14:01:31 CET 2013


---------- Forwarded message ----------
From: Semyon Kholodnov <joker.vd at gmail.com>
Date: Thu, 21 Feb 2013 16:26:58 +0400
Subject: Re: [Haskell-cafe] How to input Unicode string in Haskell program?
To: Alexander V Vershilov <alexander.vershilov at gmail.com>

I know that this problem doesn't exist on Linux. But I work on
Windows. And I use WinGHCi primarily, because it has RTF component in
it which shows Unicode. But it turns out WinGHCi merely sends commands
and receives results to/from ghci.exe. And it does it in a weird way:
it sets ghci's console code pages to current system codepages (ACP),
reads results from ghci as ACP, but sends commands to it as UTF8.
Which got interpreted as ACP.

Now, however, I have a fix for WinGHCi: in StartGHCI.c one should replace

    SetConsoleOutputCP(GetACP());
    SetConsoleCP(GetACP());

with

    SetConsoleOutputCP(CP_UTF8);
    SetConsoleCP(CP_UTF8);

and in Utf8.c, in UnicodeToLocalCodePage() body, "WideCharToMultiByte(
CP_ACP" has to be replaced with "WideCharToMultiByte( CP_UTF8", and in
LocalCodePageToUnicode() body, "MultiByteToWideChar( CP_ACP" has to be
replaced with "MultiByteToWideChar( CP_UTF8". After recompiling,
everything works great:

Prelude> x <- getLine
résumé 履歴書 резюме
Prelude> putStrLn x
résumé 履歴書 резюме
Prelude>

Is there any way to ask for this fix to be included in WinGHCi and
Haskell Platform?

2013/2/21, Alexander V Vershilov <alexander.vershilov at gmail.com>:
> The problem is that Prelude.getLine uses current locale to load characters:
> for example if you have utf8 locale, then everything works out of the box:
>
>> $ runhaskell 1.hs
>> résumé 履歴書 резюме
>> résumé 履歴書 резюме
>
> But if you change locale you'll have error:
>
>> LANG="C" runhaskell 1.hs
>> résumé 履歴書 резюме
>> 1.hs: <stdin>: hGetLine: invalid argument (invalid byte sequence)
>
> To force haskell use UTF8 you can load string as byte sequence and convert
> it to UTF-8
> charecters for example by
>
> import qualified Data.ByteString as S
> import qualified Data.Text.Encoding as T
>
> main = do
>     x <- fmap T.decodeUtf8 S.getLine
>
> now code will work even with different locale, and you'll load UTF8 from
> shell
>  independenty of user input's there
>
> --
> Alexander
>
>
> On 21 February 2013 13:58, Semyon Kholodnov <joker.vd at gmail.com> wrote:
>
>> Imagine we have this simple program:
>>
>> module Main(main) where
>>
>> main = do
>>     x <- getLine
>>     putStrLn x
>>
>> Now I want to run it somehow, enter "résumé 履歴書 резюме" and see this
>> string printed back as "résumé 履歴書 резюме". Now, the first problem is
>> that my computer runs Windows, which means that I can't use ghci
>> ":main" or result of "ghc main.hs" to enter such an outrageous string
>> — Windows console is locked to one specific local code page, and no
>> codepage contains Latin-1, Cyrillic and Kanji symbols at the same
>> time.
>>
>> But there is also WinGHCi. So I do ":main", copy-paste this string
>> into the window (It works! Because Windows has Unicode for 20 years
>> now), but the output is all messed up. In a rather curious way,
>> actually: the input string is converted to UTF-8 byte string, and its
>> bytes are treated as being characters from my local code page.
>>
>> So, it appears that I have no way to enter Unicode strings into my
>> Haskell programs by hands, I should read them from files. That's sad,
>> and I refuse to think I am the first one with such a problem, so I
>> assume there is a solution/workaround. Now would someone please tell
>> me this solution? Except from "Just stick to 127 letters of ASCII", of
>> course.
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
>
>
> --
> Alexander
>



More information about the Haskell-Cafe mailing list