[Haskell-cafe] Reading files efficiently
Donald Bruce Stewart
dons at cse.unsw.edu.au
Sun Mar 19 09:45:40 EST 2006
1:
> I've got another n00b question, thanks for all the help you have been
> giving me!
>
> I want to read a text file. As an example, let's use
> /usr/share/dict/words and try to print out the last line of the file.
> First of all I came up with this program:
>
> import System.IO
> main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines
>
> This program gives the following error, presumably because there is an
> ISO-8859-1 character in the dictionary:
> "Program error: <handle>: IO.getContents: protocol error (invalid
> character encoding)"
>
> How can I tell the Haskell system that it is to read ISO-8859-1 text
> rather than UTF-8?
>
> I now used iconv to convert the file to UTF-8 and tried again. This
> time it worked, but it seems horribly inefficient -- Hugs took 2.8
> seconds to read a 96,000 line file. By contrast the equivalent Python
> program:
>
> print open("words", "r").readlines()[-1]
>
> took 0.05 seconds. I assume I must be doing something wrong here, and
> somehow causing Haskell to use a particularly inefficient algorithm.
> Can anyone give me any clues what I should be doing instead?
a) Compile your code with GHC instead of interpreting it. GHC is blazing fast.
$ ghc -O A.hs
$ time ./a.out
Zyzzogeton
./a.out 0.23s user 0.01s system 91% cpu 0.257 total
b) If not satisifed with the result, Use packed strings (as python does).
http://www.cse.unsw.edu.au/~dons/fps.html
import qualified Data.FastPackedString as P
import IO
main = P.readFile "/usr/share/dict/words" >>= P.hPut stdout . last . P.lines
$ ghc -O2 -package fps B.hs
$ time ./a.out
Zyzzogeton./a.out 0.04s user 0.02s system 86% cpu 0.063 total
0.06s is ok with me :)
-- Don
More information about the Haskell-Cafe
mailing list