[Haskell-cafe] How to input Unicode string in Haskell program?

Jon Fairbairn jon.fairbairn at cl.cam.ac.uk
Fri Feb 22 11:38:19 CET 2013


Alexander V Vershilov <alexander.vershilov at gmail.com> writes:

> The problem is that Prelude.getLine uses current locale to load characters:
> for example if you have utf8 locale, then everything works out of the box:
>
>> $ runhaskell 1.hs
>> résumé 履歴書 резюме
>> résumé 履歴書 резюме
>
> But if you change locale you'll have error:
>
>> LANG="C" runhaskell 1.hs
>> résumé 履歴書 резюме
>> 1.hs: <stdin>: hGetLine: invalid argument (invalid byte sequence)

That seems to be correct behaviour: the only way to know the
meaning of the bits input by a user is what encoding the user
says they are in.

But in general this issue is an instance of inheriting sins from
the OS: the meaning of the bit pattern in a file should be part
of the file, but we are stuck with OSs that use a global
variable (which should be anathema to Haskell). So if user A has
locale set one way and inputs a file and sends the filename to
user B on the same system, user B might well see something
completely different to A when looking at the file.

> To force haskell use UTF8 you can load string as byte sequence
> and convert it to UTF-8 charecters

but of course, the programmer can only hope that utf-8 will work
here. If the user is typing in KOI-8R, reading it as utf-8 is
going to be wrong.
-- 
Jón Fairbairn                                 Jon.Fairbairn at cl.cam.ac.uk




More information about the Haskell-Cafe mailing list