[Haskell-beginners] hGetContents, unicode and linux
michael at snoyman.com
Sun Nov 28 02:27:19 EST 2010
On Sun, Nov 28, 2010 at 9:19 AM, Michael Snoyman <michael at snoyman.com> wrote:
> On Sun, Nov 28, 2010 at 8:53 AM, Yitzchak Gale <gale at sefer.org> wrote:
>> Michael Snoyman wrote:
>>> Perhaps a silly question, but are you certain that the input file is
>>> valid UTF-8?
>> That is a very good point.
>>> You could also try using the readFile from utf8-string...
>>> [or] read the contents as a lazy
>>> bytestring and then use the decode functions...
>> Those approaches are now both deprecated. Either do
>> what you are doing, which gives you conceptually simple
>> strings as lists of Char. Or, for better efficiency, use
>> the text package:
>>> import qualified Data.Text.Lazy as T
>>> main :: IO ()
>>> = do text <- T.readFile "unicode.txt"
>>> T.putStr text
>> In any case, you still need to have the correct encoding
>> set on the handles as before. (And the input needs to
>> be valid for your selected encoding.)
> Which is why I would actually recommend sticking with the
> bytestring/text combination when you know what the file encoding will
> be and it is not system-dependent. It's the approach that I use with
> Hamlet et al for precisely that reason.
Sorry for replying to myself, but I didn't clarify that very well.
You're right that setting encoding on the handle can work well enough
for this, but it does *not* address invalid byte sequences (AFAIK),
which can be dealt with using the bytestring/text decoding
More information about the Beginners