[Haskell-cafe] How to input Unicode string in Haskell program?

Erik Hesselink hesselink at gmail.com
Thu Feb 21 13:44:14 CET 2013


You can also set the locale encoding for a handle (e.g.
System.IO.stdin) from code using `System.IO.hSetEncoding` [0].

Erik

[0] http://hackage.haskell.org/packages/archive/base/latest/doc/html/System-IO.html#v:hSetEncoding

On Thu, Feb 21, 2013 at 12:07 PM, Alexander V Vershilov
<alexander.vershilov at gmail.com> wrote:
> The problem is that Prelude.getLine uses current locale to load characters:
> for example if you have utf8 locale, then everything works out of the box:
>
>> $ runhaskell 1.hs
>> résumé 履歴書 резюме
>> résumé 履歴書 резюме
>
> But if you change locale you'll have error:
>
>> LANG="C" runhaskell 1.hs
>> résumé 履歴書 резюме
>> 1.hs: <stdin>: hGetLine: invalid argument (invalid byte sequence)
>
> To force haskell use UTF8 you can load string as byte sequence and convert
> it to UTF-8
> charecters for example by
>
> import qualified Data.ByteString as S
> import qualified Data.Text.Encoding as T
>
> main = do
>     x <- fmap T.decodeUtf8 S.getLine
>
> now code will work even with different locale, and you'll load UTF8 from
> shell
>  independenty of user input's there
>
> --
> Alexander
>
>
> On 21 February 2013 13:58, Semyon Kholodnov <joker.vd at gmail.com> wrote:
>>
>> Imagine we have this simple program:
>>
>> module Main(main) where
>>
>> main = do
>>     x <- getLine
>>     putStrLn x
>>
>> Now I want to run it somehow, enter "résumé 履歴書 резюме" and see this
>> string printed back as "résumé 履歴書 резюме". Now, the first problem is
>> that my computer runs Windows, which means that I can't use ghci
>> ":main" or result of "ghc main.hs" to enter such an outrageous string
>> — Windows console is locked to one specific local code page, and no
>> codepage contains Latin-1, Cyrillic and Kanji symbols at the same
>> time.
>>
>> But there is also WinGHCi. So I do ":main", copy-paste this string
>> into the window (It works! Because Windows has Unicode for 20 years
>> now), but the output is all messed up. In a rather curious way,
>> actually: the input string is converted to UTF-8 byte string, and its
>> bytes are treated as being characters from my local code page.
>>
>> So, it appears that I have no way to enter Unicode strings into my
>> Haskell programs by hands, I should read them from files. That's sad,
>> and I refuse to think I am the first one with such a problem, so I
>> assume there is a solution/workaround. Now would someone please tell
>> me this solution? Except from "Just stick to 127 letters of ASCII", of
>> course.
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
>
>
> --
> Alexander
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



More information about the Haskell-Cafe mailing list