[Haskell-beginners] hGetContents exception

Felipe Almeida Lessa felipe.lessa at gmail.com
Wed Sep 14 15:17:17 CEST 2011


On Wed, Sep 14, 2011 at 9:50 AM, Luca Ciciriello
<luca_ciciriello at hotmail.com> wrote:
> Hi All.
> I'm using the function hGetContents in order to read some text file. If one or more of these text file have a wrong UTF encoding, I get the error:
>
> hGetContents: invalid argument (Illegal byte sequence)
>
> My workaround is to open the wrong encoded file in emacs and create a copy of this file (cut and paste in a new buffer). After this operation the new file has a correct UTF encoding and hGetContents doesn't complain any more.
>
> Is there a better way to read (without complaining) such wrong file without an external action (emacs)?

Yes, use the text package [1].  More specifically, you want to read
your file to a ByteString bs and do "decodeUtf8With lenientDecode bs"
[2,3].  I strongly advise against using "ignore", it may pose a
security threat to your application.

Cheers!

[1] http://hackage.haskell.org/package/text
[2] http://hackage.haskell.org/packages/archive/text/0.11.1.5/doc/html/Data-Text-Encoding.html#v:decodeUtf8With
[3] http://hackage.haskell.org/packages/archive/text/0.11.1.5/doc/html/Data-Text-Encoding-Error.html#v:lenientDecode

-- 
Felipe.



More information about the Beginners mailing list