[Haskell-cafe] getting crazy with character encoding
Don Stewart
dons at galois.com
Wed Sep 12 15:51:05 EDT 2007
mailing_list:
> On Wed, Sep 12, 2007 at 11:16:25AM -0400, Seth Gordon wrote:
> > It appears that in spite of the locale definition, hGetContents is treating
> > each byte as a separate character without translating the multi-byte
> > sequences *from* UTF-8, and then putStrLn sends each of those bytes to
> > standard output without translating the non-ASCII characters *to* UTF-8. So
> > the second line of your program's output is correct...but only by accident.
>
> that's it indeed. As I said in the message I've just sent, I've read
> that the String/CString conversion is automatically done in
> ISO-8859-1, so "èèè", which are 6 bytes in utf-8, are translated
> into 6 iso-8859-1 characters.
>
> What puzzles me is the behavior of putStrLn.
>
> Thanks for your time.
Have you tried the utf8-string conversion library?
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string-0.1
-- Don
More information about the Haskell-Cafe
mailing list