[Haskell-cafe] Loading a csv file with ~200 columns into Haskell Record

Leandro Ostera leandro at ostera.io
Sun Oct 1 11:08:04 UTC 2017


Two things come to mind.

The first one is *Crazy idea, bad pitch*: generate the record code from the
data.

The second is to make the records dynamically typed:

Would it be simpler to define a Column type you can parameterize with a
string for its name (GADTs?) so you automatically get a type of that
specific column?

That way as you read the CSV files you could define the type of the columns
based on the actual column name.

Rows would then become sets of pairings of defined columns and values,
perhaps having a Maybe would encode that any given value for a particular
column is missing. You could encode these pairings a list too.

At least there you can have type guarantees that you’re joining fields that
are of the same column type. I think.

Either way, my 2 cents and keep it up!


sön 1 okt. 2017 kl. 03:34 skrev Guru Devanla <gurudev.devanla at gmail.com>:

> Hello All,
>
> I am in the process of replicating some code in Python in Haskell.
>
> In Python, I load a couple of csv files, each file having more than 100
> columns into a Pandas' data frame. Panda's data-frame, in short is a
> tabular structure which lets me performs on bunch of joins, and filter out
> data. I generated different shapes of reports using these operations. Of
> course, I would love some type checking to help me with these merge, join
> operations as I create different reports.
>
> I am not looking to replicate the Pandas data-frame functionality in
> Haskell. First thing I want to do is reach out to the 'record' data
> structure. Here are some ideas I have:
>
> 1.  I need to declare all these 100+ columns into multiple record
> structures.
> 2.  Some of the columns can have NULL/NaN values. Therefore, some of the
> attributes of the record structure would be 'MayBe' values. Now, I could
> drop some columns during load and cut down the number of attributes i
> created per record structure.
> 3.  Create a dictionary of each record structure which will help me index
> into into them.'
>
> I would like some feedback on the first 2 points. Seems like there is a
> lot of boiler plate code I have to generate for creating 100s of record
> attributes. Is this the only sane way to do this?  What other patterns
> should I consider while solving such a problem.
>
> Also, I do not want to add too many dependencies into the project, but
> open to suggestions.
>
> Any input/advice on this would be very helpful.
>
> Thank you for the time!
> Guru
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20171001/b4fb4f88/attachment.html>


More information about the Haskell-Cafe mailing list