[Haskell-cafe] Iteratee performance
Vasyl Pasternak
vasyl.pasternak at gmail.com
Wed Mar 17 04:15:08 EDT 2010
Hi Cafe,
Yesterday I played with iteratee package, and wanted to check its
performance. I tried to count lines in a file, as Oleg in his famous
lazy_vs_correct[1] article. The results somewhat disappointed me.
The statistics and code follows, but shortly: lazy bytestring is the
fastest, iteratee with bytestrings 10 times slower than lazy
bytestring. When comparing lazy string and iteratee with [Char], than
their results were close, but lazy string reading uses less memory and
was a bit faster (20%).
I performed test on 250Mb file with 5 millions lines.
Now I am figuring out, is these tests are correct and this is ordinary
behavior, so iteratee not so fast as I thought, or there is some
mistake in my code.
[1] http://okmij.org/ftp/Haskell/Iteratee/Lazy-vs-correct.txt
--------------------- TIMING RESULTS ----------------------
$ time wc -l 5000000.txt
5000000 5000000.txt
real 0m0.314s
user 0m0.184s
sys 0m0.124s
$ time ./bytestring_test 5000000.txt
5000000
real 0m0.817s
user 0m0.616s
sys 0m0.200s
$ time ./bytestring_iteratee_test 5000000.txt
real 0m7.801s
user 0m7.552s
sys 0m0.252s
$ time ./string_test 5000000.txt
5000000
real 0m47.427s
user 0m46.675s
sys 0m0.648s
$ time ./string_iteratee_test 5000000.txt
5000000
real 0m59.225s
user 0m57.680s
sys 0m0.840s
-------------------------- RTS INFO --------------------------------
./bytestring_test 5000000.txt +RTS -sbs.out
807,225,096 bytes allocated in the heap
122,240 bytes copied during GC
59,496 bytes maximum residency (1 sample(s))
22,424 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 1540 collections, 0 parallel, 0.03s, 0.02s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.59s ( 0.79s elapsed)
GC time 0.03s ( 0.02s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.62s ( 0.82s elapsed)
%GC time 4.5% (2.8% elapsed)
Alloc rate 1,372,743,081 bytes per MUT second
Productivity 95.5% of total user, 72.1% of total elapsed
-----
./bytestring_iteratee_test 5000000.txt +RTS -siter.out
11,024,100,312 bytes allocated in the heap
893,436,512 bytes copied during GC
95,456 bytes maximum residency (1 sample(s))
23,216 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 21030 collections, 0 parallel, 2.51s, 2.45s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.02s elapsed)
MUT time 6.37s ( 6.66s elapsed)
GC time 2.52s ( 2.45s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 8.88s ( 9.12s elapsed)
%GC time 28.3% (26.9% elapsed)
Alloc rate 1,731,061,437 bytes per MUT second
Productivity 71.7% of total user, 69.8% of total elapsed
-----
./string_test 5000000.txt +RTS -sstr.out
38,561,155,264 bytes allocated in the heap
9,862,623,816 bytes copied during GC
223,080 bytes maximum residency (5026 sample(s))
47,264 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 68525 collections, 0 parallel, 22.50s, 22.51s elapsed
Generation 1: 5026 collections, 0 parallel, 1.38s, 1.36s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 22.80s ( 23.55s elapsed)
GC time 23.87s ( 23.87s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 46.67s ( 47.42s elapsed)
%GC time 51.1% (50.3% elapsed)
Alloc rate 1,691,170,222 bytes per MUT second
Productivity 48.9% of total user, 48.1% of total elapsed
-----
./string_iteratee_test 5000000.txt +RTS -sstriter.out
40,164,683,672 bytes allocated in the heap
7,108,638,256 bytes copied during GC
212,624 bytes maximum residency (821 sample(s))
50,264 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 75791 collections, 0 parallel, 33.14s, 33.75s elapsed
Generation 1: 821 collections, 0 parallel, 0.56s, 0.63s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 23.99s ( 24.84s elapsed)
GC time 33.69s ( 34.38s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 57.68s ( 59.22s elapsed)
%GC time 58.4% (58.1% elapsed)
Alloc rate 1,674,540,397 bytes per MUT second
Productivity 41.6% of total user, 40.5% of total elapsed
------------------ SOURCECODE -----------------------------
$ cat bytestring_test.hs
import System.Environment
import Control.Monad
import qualified Data.ByteString.Lazy.Char8 as B
count s = liftM (length . B.lines) (B.readFile s)
main = do
[f] <- getArgs
print =<< count f
---------------
$ cat bytestring_iteratee_test.hs
import qualified Data.Iteratee.IO as I
import qualified Data.Iteratee as I
import qualified Data.Iteratee.Char as I
import qualified Data.Iteratee.WrappedByteString as I
import System.Environment
import Control.Monad
count s = I.fileDriverRandom (cnt) s
cnt :: (Monad m, Functor m) => I.IterateeG I.WrappedByteString Char m Int
cnt = I.joinI $ I.enumLines I.length
main = do
[f] <- getArgs
print =<< count f
------------------
$ cat string_test.hs
import System.Environment
import Control.Monad
count s = liftM (length . lines) (readFile s)
main = do
[f] <- getArgs
print =<< count f
----------------------
$ cat string_iteratee_test.hs
import qualified Data.Iteratee.IO as I
import qualified Data.Iteratee as I
import qualified Data.Iteratee.Char as I
import qualified Data.Iteratee.WrappedByteString as I
import System.Environment
import Control.Monad
count s = I.fileDriverRandom (cnt) s
cnt :: (Monad m, Functor m) => I.IterateeG [] Char m Int
cnt = I.joinI $ I.enumLines I.length
main = do
[f] <- getArgs
print =<< count f
Best regards,
Vasyl
More information about the Haskell-Cafe
mailing list