[Haskell-beginners] hGetContents, unicode and linux

Michael Snoyman michael at snoyman.com
Sun Nov 28 01:35:58 EST 2010


On Sun, Nov 28, 2010 at 8:26 AM, Erik de Castro Lopo
<mle+hs at mega-nerd.com> wrote:
> Hi all,
>
> I've got a trivial test program:
>
>    main :: IO ()
>    main
>     = do   text <- readFile "unicode.txt"
>            putStr text
>
> which I compile with ghc-6.12.1 (from Debian) and when it runs I get:
>
>    hGetContents: invalid argument (Invalid or incomplete multibyte or wide character)
>
> I've done some googling which seems to suggest that I need to set
> the LANG environment variable, but I already have that set to
> en_AU.UTF-8.
>
> Clues?
>
> Cheers,
> Erik

Perhaps a silly question, but are you certain that the input file is
valid UTF-8? You could also try using the readFile from
utf8-string[1], which I believe ignores improper UTF8 sequences. A
theoretically better approach is to read the contents as a lazy
bytestring and then use the decode functions from the text package,
but that's a little bit more work.

[1] http://hackage.haskell.org/packages/archive/utf8-string/0.3.6/doc/html/System-IO-UTF8.html#v:readFile


More information about the Beginners mailing list