[Haskell-cafe] data analysis question

Michael Snoyman michael at snoyman.com
Thu Nov 13 07:29:02 UTC 2014


Somewhat off topic, but: I said csv-conduit because I have some experience
with it. When we were doing some analytic work at FP Complete, a few of us
analyzed both csv-conduit and cassava, and didn't really have a good feel
for which was the better library. We went with csv-conduit[1], but I'd be
really interested in hearing a comparison of the two libraries from someone
who knows about them.

[1] Don't ask what tipped us in that direction, I honestly don't remember
what it was.

On Thu Nov 13 2014 at 9:24:47 AM Christopher Allen <cma at bitemyapp.com>
wrote:

> Memory profiling only to test how stream-y the streaming was. I didn't
> think perf would be that different between them. The way I had to transform
> my fold for Pipes was a titch awkward, otherwise happy with it.
>
> If people are that interested in the perf side of things I can setup a
> criterion harness and publish those numbers as well.
>
> Mostly I was impressed with:
>
> 1. How easy it was to start using the streaming module in Cassava because
> it's just a Foldable instance.
>
> 2. How Pipes used <600kb of memory.
>
> Your pull request for csv-conduit looks really clean and nice. I've merged
> it, thanks for sending it my way!
>
> --- Chris Allen
>
>
> On Thu, Nov 13, 2014 at 12:26 AM, Christopher Reichert <
> creichert07 at gmail.com> wrote:
>
>>
>> On Wed, Nov 12 2014, Christopher Allen <cma at bitemyapp.com> wrote:
>> > [Snip]
>> > csv-conduit isn't in the test results because I couldn't figure out how
>> to
>> > use it. pipes-csv is proper streaming, but uses cassava's parsing
>> machinery
>> > and data types. Possibly this is a problem if you have really wide rows
>> but
>> > I've never seen anything that would be problematic in that realm even
>> when
>> > I did a lot of HDFS/Hadoop ecosystem stuff. AFAICT with pipes-csv you're
>> > streaming rows, but not columns. With csv-conduit you might be able to
>> > incrementally process the columns too based on my guess from glancing at
>> > the rather scary code.
>> >
>>
>> Any problems in particular? I've had pretty good luck with
>> csv-conduit. However, I have noticed that it's rather picky about type
>> signatures and integrating custom data types isn't straight forward at
>> first.
>>
>> csv-conduit also seems to have drawn inspiration from cassava:
>>
>> http://hackage.haskell.org/package/csv-conduit-0.6.3/docs/Data-CSV-Conduit-Conversion.html
>>
>> > [Snip]
>> > To that end, take a look at my rather messy workspace here:
>> > https://github.com/bitemyapp/csvtest
>>
>> I've made a PR for the conduit version:
>> https://github.com/bitemyapp/csvtest/pull/1
>>
>>
>> It could certainly be made more performent but it seems to hold up well
>> in comparison. I would be interested in reading the How I Start Article
>> and hearing more about your conclusions. Is this focused primarily on
>> the memory profile or also speed?
>>
>>
>> Regards,
>> -Christopher
>>
>>
>> > Haskell-Cafe mailing list
>> > Haskell-Cafe at haskell.org
>> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20141113/a29b4772/attachment.html>


More information about the Haskell-Cafe mailing list