[Haskell] Speed of ByteString.Lazy
chad.scherrer at gmail.com
Thu Jun 29 14:18:44 EDT 2006
I have a bunch of data files where each line represents a data point. It's
nice to be able to quickly tell how many data points I have. I had been
using wc, like this:
% cat *.txt | /usr/bin/time wc
2350570 4701140 49149973
5.81user 0.03system 0:06.08elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (152major+18minor)pagefaults 0swaps
I only really care about the line count and the time it takes. For larger
data sets, I was getting tired of waiting for wc, and I wondered whether
ByteString.Lazy could help me do better. So I wrote a 2-liner:
import qualified Data.ByteString.Lazy.Char8 as L
main = L.getContents >>= print . L.count '\n'
... and compiled this as "lc". It doesn't get much simpler than that. How
does it perform?
% cat *.txt | /usr/bin/time lc
0.09user 0.13system 0:00.24elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (199major+211minor)pagefaults 0swaps
Wow. 64 times as fast for this run, with almost no effort on my part.
Granted, wc is doing more work, but the number of words and characters
aren't interesting to me in this case, anyway. I can't imagine
(implementation time)*(execution time) being much shorter. Thanks, Don!
"Time flies like an arrow; fruit flies like a banana" -- Groucho Marx
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell