[Haskell-beginners] hGetContents, unicode and linux
Michael Snoyman
michael at snoyman.com
Sun Nov 28 01:35:58 EST 2010
On Sun, Nov 28, 2010 at 8:26 AM, Erik de Castro Lopo
<mle+hs at mega-nerd.com> wrote:
> Hi all,
>
> I've got a trivial test program:
>
> main :: IO ()
> main
> = do text <- readFile "unicode.txt"
> putStr text
>
> which I compile with ghc-6.12.1 (from Debian) and when it runs I get:
>
> hGetContents: invalid argument (Invalid or incomplete multibyte or wide character)
>
> I've done some googling which seems to suggest that I need to set
> the LANG environment variable, but I already have that set to
> en_AU.UTF-8.
>
> Clues?
>
> Cheers,
> Erik
Perhaps a silly question, but are you certain that the input file is
valid UTF-8? You could also try using the readFile from
utf8-string[1], which I believe ignores improper UTF8 sequences. A
theoretically better approach is to read the contents as a lazy
bytestring and then use the decode functions from the text package,
but that's a little bit more work.
[1] http://hackage.haskell.org/packages/archive/utf8-string/0.3.6/doc/html/System-IO-UTF8.html#v:readFile
More information about the Beginners
mailing list