[Haskell] Speed of ByteString.Lazy

Chad Scherrer chad.scherrer at gmail.com
Thu Jun 29 14:18:44 EDT 2006


I have a bunch of data files where each line represents a data point. It's
nice to be able to quickly tell how many data points I have. I had been
using wc, like this:

% cat *.txt | /usr/bin/time wc
2350570 4701140 49149973
5.81user 0.03system 0:06.08elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (152major+18minor)pagefaults 0swaps

I only really care about the line count and the time it takes. For larger
data sets, I was getting tired of waiting for wc, and I wondered whether
ByteString.Lazy could help me do better. So I wrote a 2-liner:

import qualified Data.ByteString.Lazy.Char8 as L
main = L.getContents >>= print . L.count '\n'

... and compiled this as "lc". It doesn't get much simpler than that. How
does it perform?

% cat *.txt | /usr/bin/time lc
2350570
0.09user 0.13system 0:00.24elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (199major+211minor)pagefaults 0swaps

Wow. 64 times as fast for this run, with almost no effort on my part.
Granted, wc is doing more work, but the number of words and characters
aren't interesting to me in this case, anyway. I can't imagine
(implementation time)*(execution time) being much shorter. Thanks, Don!

-- 

Chad Scherrer

"Time flies like an arrow; fruit flies like a banana" -- Groucho Marx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org//pipermail/haskell/attachments/20060629/b680a06d/attachment.htm


More information about the Haskell mailing list