[Haskell-cafe] How to input Unicode string in Haskell program?
Jon Fairbairn
jon.fairbairn at cl.cam.ac.uk
Fri Feb 22 11:38:19 CET 2013
Alexander V Vershilov <alexander.vershilov at gmail.com> writes:
> The problem is that Prelude.getLine uses current locale to load characters:
> for example if you have utf8 locale, then everything works out of the box:
>
>> $ runhaskell 1.hs
>> résumé 履歴書 резюме
>> résumé 履歴書 резюме
>
> But if you change locale you'll have error:
>
>> LANG="C" runhaskell 1.hs
>> résumé 履歴書 резюме
>> 1.hs: <stdin>: hGetLine: invalid argument (invalid byte sequence)
That seems to be correct behaviour: the only way to know the
meaning of the bits input by a user is what encoding the user
says they are in.
But in general this issue is an instance of inheriting sins from
the OS: the meaning of the bit pattern in a file should be part
of the file, but we are stuck with OSs that use a global
variable (which should be anathema to Haskell). So if user A has
locale set one way and inputs a file and sends the filename to
user B on the same system, user B might well see something
completely different to A when looking at the file.
> To force haskell use UTF8 you can load string as byte sequence
> and convert it to UTF-8 charecters
but of course, the programmer can only hope that utf-8 will work
here. If the user is typing in KOI-8R, reading it as utf-8 is
going to be wrong.
--
Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk
More information about the Haskell-Cafe
mailing list