[Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV

Johan Tibell johan.tibell at gmail.com
Tue Feb 26 00:46:59 CET 2013


On Mon, Feb 25, 2013 at 2:32 PM, Don Stewart <dons00 at gmail.com> wrote:

> Cassava is quite new, but has the same goals as lazy-csv.
>
> Its about a year old now -
> http://blog.johantibell.com/2012/08/a-new-fast-and-easy-to-use-csv-library.html
>
> I know Johan has been working on the benchmarks of late - it would be very
> good to know how the two compare in features
>
I whipped together a quick benchmark:
https://github.com/tibbe/cassava/blob/master/benchmarks/Benchmarks.hs

To run, check out the cassava repo on GitHub and run: cabal configure
--enable-benchmarks && cabal build && cabal bench

Here are the results (all the normal caveats for benchmarking applies):

benchmarking positional/decode/presidents/without conversion
mean: 62.85965 us, lb 62.56705 us, ub 63.26101 us, ci 0.950
std dev: 1.751446 us, lb 1.371323 us, ub 2.295576 us, ci 0.950

benchmarking positional/decode/streaming/presidents/without conversion
mean: 93.81925 us, lb 91.14701 us, ub 98.19217 us, ci 0.950
std dev: 17.20842 us, lb 11.58690 us, ub 23.41786 us, ci 0.950

benchmarking comparison/lazy-csv
mean: 133.2609 us, lb 132.4415 us, ub 135.3085 us, ci 0.950
std dev: 6.193178 us, lb 3.123661 us, ub 12.83148 us, ci 0.950

The two first set of numbers are for cassava (in the all-at-once vs
streaming mode). The last set is for lazy-csv.

The feature sets of the two libraries are quite different. Both do basic
CSV parsing (with some extensions).

 * lazy-csv parses CSV data to something akin to [[ByteString]], but with a
heavy focus on error recovery and precise error messages.
 * cassava parses CSV data to [a], where a is a user-defined type that
represents a CSV record. There are options to recover from *type
conversion* errors, but not from malformed CSV. cassava has several parsing
modes: incremental for parsing interleaved with I/O, streaming for lazy
parsing (with or without I/O), and all-at-once parsing for when you want to
hold all the data in memory.

-- Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130225/66a18019/attachment.htm>


More information about the Haskell-Cafe mailing list