UTF-8 decoding error
Simon Marlow
simonmarhaskell at gmail.com
Fri Feb 10 06:12:16 EST 2006
Christian Maeder wrote:
> Simon Marlow wrote:
>
>> Christian Maeder wrote:
>>
>>> I'm tempted to replace "ä" bei "\228" in literals. What does haddock
>>> do with utf-8 in comments? Will DrIFT -- using read- and writeFile --
>>> still work correctly?
>
>
> The problem I fear is that writeFile does not produce a utf-8 encoded file:
>
> writeFile "t.hs" "main = putStrLn \"äöüßÄÖÜ\""
>
> Using "\228\246\252\223\196\214\220" instead of "äöüßÄÖÜ" only avoids
> conversion to utf-8 of the initial file l1.hs (attached), but the
> generated file t.hs is a latin-1 file in both cases.
>
> Cheers Christian
>
> *Main> :l l1.hs
> Compiling Main ( l1.hs, interpreted )
> Ok, modules loaded: Main.
> *Main> main
> *Main> :l t.hs
> Compiling Main ( t.hs, interpreted )
> Ok, modules loaded: Main.
> *Main> main
> äöüßÄÖÜ
I'm not sure I see the problem - the I/O library doesn't do unicode
encoding/decoding, it always just takes the low 8 bits of each
character, hence truncating Unicode to Latin-1. If you restrict
yourself to Latin-1 characters in string literals, then I/O will work as
expected (i.e. Latin-1 only).
If you need to do I/O in a different encoding, I'm afraid you'll have to
code it up yourself right now, or use some other library (there are
packed string libraries around that can do I/O in UTF-8, for example,
and Bulat's new I/O library does char encodings).
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list