[Haskell-cafe] data analysis question

Peter Simons simons at cryp.to
Wed Nov 12 10:21:20 UTC 2014


Hi Tobias,

 > A friend [is] currently looking into how best work with assorted
 > usage data: currently 250 million entries as a 12GB in a csv
 > comprising of information such as which channel was tuned in for how
 > long with which user agent and what not.

as much as I love Haskell, the tool of choice for data analysis is GNU R,
not so much because of the language, but simply because of the vast array of
high-quality libraries that cover topics, like statistics, machine learning,
visualization, etc. You'll find it at <http://www.r-project.org/>.

If you'd want to analyze 12 GB of data in Haskell, you'd have to jump
through all kinds of hoops just to load that CVS file into memory. It's
possible, no doubt, but pulling it off efficiently requires a lot of
expertise in Haskell that statistics guys don't necessarily have (and
arguably they shouldn't have to).

The package Rlang-QQ integrates R into Haskell, which might be a nice way to
deal with this task, but I have no personal experience with that library, so
I'm not sure whether this adds much value.

Just my 2 cents,
Peter



More information about the Haskell-Cafe mailing list