[Haskell-cafe] data analysis question

Markus Läll markus.l2ll at gmail.com
Wed Nov 12 22:17:19 UTC 2014


Hi Tobias,

What he could do is encode the column values to appropriate lengths of
Word's to reduce the size -- to make it fit in ram. E.g listening times as
seconds, browsers as categorical variables (in statistics terms), etc. If
some of the columns are arbitrary length strings, then it seems possible to
get 12GB down by more than half.

If he doesn't know Haskell, then I'd suggest using  another language.
(Years ago I tried to do a bigger uni project in Haskell-- being a noob
--and failed miserably.)
On Nov 12, 2014 10:45 AM, "Tobias Pflug" <tobias.pflug at gmx.net> wrote:

> Hi,
>
> just the other day I talked to a friend of mine who works for an online
> radio service who told me he was currently looking into how best work with
> assorted usage data: currently 250 million entries as a 12GB in a csv
> comprising of information such as which channel was tuned in for how long
> with which user agent and what not.
>
> He accidentally ran into K and Q programming language (*1) which
> apparently work nicely for this as unfamiliar as it might seem.
>
> This certainly is not my area of expertise at all. I was just wondering
> how some of you would suggest to approach this with Haskell. How would you
> most efficiently parse such data evaluating custom queries ?
>
> Thanks for your time,
> Tobi
>
> [1] (http://en.wikipedia.org/wiki/K_(programming_language)
> [2] http://en.wikipedia.org/wiki/Q_(programming_language_from_Kx_Systems)
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20141112/41d411c6/attachment.html>


More information about the Haskell-Cafe mailing list