[Haskell-cafe] data analysis question

Christopher Reichert creichert07 at gmail.com
Thu Nov 13 06:26:46 UTC 2014


On Wed, Nov 12 2014, Christopher Allen <cma at bitemyapp.com> wrote:
> [Snip]
> csv-conduit isn't in the test results because I couldn't figure out how to
> use it. pipes-csv is proper streaming, but uses cassava's parsing machinery
> and data types. Possibly this is a problem if you have really wide rows but
> I've never seen anything that would be problematic in that realm even when
> I did a lot of HDFS/Hadoop ecosystem stuff. AFAICT with pipes-csv you're
> streaming rows, but not columns. With csv-conduit you might be able to
> incrementally process the columns too based on my guess from glancing at
> the rather scary code.
>

Any problems in particular? I've had pretty good luck with
csv-conduit. However, I have noticed that it's rather picky about type
signatures and integrating custom data types isn't straight forward at
first.

csv-conduit also seems to have drawn inspiration from cassava:
http://hackage.haskell.org/package/csv-conduit-0.6.3/docs/Data-CSV-Conduit-Conversion.html

> [Snip]
> To that end, take a look at my rather messy workspace here:
> https://github.com/bitemyapp/csvtest

I've made a PR for the conduit version:
https://github.com/bitemyapp/csvtest/pull/1


It could certainly be made more performent but it seems to hold up well
in comparison. I would be interested in reading the How I Start Article
and hearing more about your conclusions. Is this focused primarily on
the memory profile or also speed?


Regards,
-Christopher


> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe


More information about the Haskell-Cafe mailing list