[Haskell-beginners] hGetContents, unicode and linux
Michael Snoyman
michael at snoyman.com
Sun Nov 28 02:19:58 EST 2010
On Sun, Nov 28, 2010 at 8:53 AM, Yitzchak Gale <gale at sefer.org> wrote:
> Michael Snoyman wrote:
>> Perhaps a silly question, but are you certain that the input file is
>> valid UTF-8?
>
> That is a very good point.
>
>> You could also try using the readFile from utf8-string...
>> [or] read the contents as a lazy
>> bytestring and then use the decode functions...
>
> Those approaches are now both deprecated. Either do
> what you are doing, which gives you conceptually simple
> strings as lists of Char. Or, for better efficiency, use
> the text package:
>
>> import qualified Data.Text.Lazy as T
>> main :: IO ()
>> main
>> = do text <- T.readFile "unicode.txt"
>> T.putStr text
>
> In any case, you still need to have the correct encoding
> set on the handles as before. (And the input needs to
> be valid for your selected encoding.)
Which is why I would actually recommend sticking with the
bytestring/text combination when you know what the file encoding will
be and it is not system-dependent. It's the approach that I use with
Hamlet et al for precisely that reason.
Michael
More information about the Beginners
mailing list