UTF-8 decoding error

Simon Marlow simonmarhaskell at gmail.com
Fri Feb 10 06:12:16 EST 2006


Christian Maeder wrote:
> Simon Marlow wrote:
> 
>> Christian Maeder wrote:
>>
>>> I'm tempted to replace "ä" bei "\228" in literals. What does haddock 
>>> do with utf-8 in comments? Will DrIFT -- using read- and writeFile -- 
>>> still work correctly?
> 
> 
> The problem I fear is that writeFile does not produce a utf-8 encoded file:
> 
> writeFile "t.hs" "main = putStrLn \"äöüßÄÖÜ\""
> 
> Using "\228\246\252\223\196\214\220" instead of "äöüßÄÖÜ" only avoids 
> conversion to utf-8 of the initial file l1.hs (attached), but the 
> generated file t.hs is a latin-1 file in both cases.
> 
> Cheers Christian
> 
> *Main> :l l1.hs
> Compiling Main             ( l1.hs, interpreted )
> Ok, modules loaded: Main.
> *Main> main
> *Main> :l t.hs
> Compiling Main             ( t.hs, interpreted )
> Ok, modules loaded: Main.
> *Main> main
> äöüßÄÖÜ

I'm not sure I see the problem - the I/O library doesn't do unicode 
encoding/decoding, it always just takes the low 8 bits of each 
character, hence truncating Unicode to Latin-1.  If you restrict 
yourself to Latin-1 characters in string literals, then I/O will work as 
expected (i.e. Latin-1 only).

If you need to do I/O in a different encoding, I'm afraid you'll have to 
code it up yourself right now, or use some other library (there are 
packed string libraries around that can do I/O in UTF-8, for example, 
and Bulat's new I/O library does char encodings).

Cheers,
	Simon


More information about the Glasgow-haskell-users mailing list