[Haskell-cafe] Iteratee performance
jwlato at gmail.com
Fri Mar 19 09:22:46 EDT 2010
I think this is a bit easier to write with iteratee-HEAD. There are
some significant changes from the 0.3 version, and not all the old
functions are implemented yet, however the "cnt" iteratee can be
cnt :: Monad m => I.Iteratee S.ByteString m Int
cnt = I.liftI (step 0)
step acc (I.Chunk bs) | S.null bs = I.icont (step acc) Nothing
step acc (I.Chunk bs) = let acc' = acc + S.count '\n' bs in acc'
`seq` I.icont (step acc') Nothing
step acc str = I.idone acc str
One significant change is that the kind of the first parameter to the
Iteratee type has changed, so now it should be a fully-applied type
instead of a type function. This means that ByteString can be used
directly. "idone", "icont", and "liftI" simplify creation of
iteratees, and it's usually not necessary to pattern match on the EOF
constructor any more. The first "step" definition above can be left
out for a small performance penalty.
I wouldn't recommend basing any production code on iteratee-HEAD, as
the interface isn't quite finalized yet.
With this version of "cnt" and iteratee-HEAD, the iteratee version
runs in about 2.2 seconds for the same tests as I used below.
Changing the enumFd buffer size to 32K gives the following results:
a 460MB input file (generated by cp Tiff.hs long.txt; cat long.txt >> long.txt):
MusDept-MacBook-1:Examples johnlato$ time wc -l long.txt
MusDept-MacBook-1:Examples johnlato$ time ./test_bs < long.txt
MusDept-MacBook-1:Examples johnlato$ time ./test_iter long.txt
All time values are averages of 3 runs. The first run for the
bytestring version was a bit long, otherwise run times were very
consistent, within 0.004s for each executable.
I don't see any reason the buffer size needs to be fixed at compile
time. I'll make this change in the next major release.
> From: Vasyl Pasternak <vasyl.pasternak at gmail.com>
> Subject: Re: [Haskell-cafe] Iteratee performance
> To: Gregory Collins <greg at gregorycollins.net>
> Thank you, your code helps, now my it runs in the speed of lazy
> bytestring test but uses less memory with it.
> I've only added to your code more strictness in the recursion, my
> version is below.
> BTW, I think it is more useful to let user set the chunk size for
> reading, so I'd like to see this possibility in the iteratee package.
More information about the Haskell-Cafe