<div dir="ltr"><div>Thank you all for your helpful suggestions. As I wrote the original question, even I was trying to decide between the approach of using Records to represent each row or define a vector for each column and each vector becomes an attribute of the record. Even, I was leaning towards the latter given the performance needs.<br><br>Since, the file is currently available as a CSV adding Persistent and any ORM library would be an added dependency.<br><br></div>I was trying to solve this problem without too many dependencies of other libraries and wanting to learn new DSLs. Its a tempting time killer as everyone here would understand.<br><div><br>@Anthony Thank your for your answer as well. I have explored Frames library in the past as I tried to look for Pandas like features in Haskell The library is useful and I have played around with it. But, I was never confident in adopting it for a serious project. Part of my reluctance, would be the learning curve plus I also need to familiarize myself with `lens` as well. But, looks like this project I have in hand is a good motivation to do both. I will try to use Frames and then report back. Also, apologies for not being able to share the data I am working on.<br><br></div><div>With the original question, what I was trying to get to is, how are these kinds of problems solved in real-world projects. Like when Haskell is used in data mining, or in financial applications. I believe these applications deal with this kind of data where the tables are wide. Having to not have something which I can quickly start off on troubles me and makes me wonder if the reason is my lack of understanding or just the pain of using static typing.<br></div><div><br>Regards<br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Oct 1, 2017 at 1:58 PM, Anthony Cowley <span dir="ltr"><<a href="mailto:acowley@seas.upenn.edu" target="_blank">acowley@seas.upenn.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>
<br>
> On Sep 30, 2017, at 9:30 PM, Guru Devanla <<a href="mailto:gurudev.devanla@gmail.com">gurudev.devanla@gmail.com</a>> wrote:<br>
><br>
> Hello All,<br>
><br>
> I am in the process of replicating some code in Python in Haskell.<br>
><br>
> In Python, I load a couple of csv files, each file having more than 100 columns into a Pandas' data frame. Panda's data-frame, in short is a tabular structure which lets me performs on bunch of joins, and filter out data. I generated different shapes of reports using these operations. Of course, I would love some type checking to help me with these merge, join operations as I create different reports.<br>
><br>
> I am not looking to replicate the Pandas data-frame functionality in Haskell. First thing I want to do is reach out to the 'record' data structure. Here are some ideas I have:<br>
><br>
> 1. I need to declare all these 100+ columns into multiple record structures.<br>
> 2. Some of the columns can have NULL/NaN values. Therefore, some of the attributes of the record structure would be 'MayBe' values. Now, I could drop some columns during load and cut down the number of attributes i created per record structure.<br>
> 3. Create a dictionary of each record structure which will help me index into into them.'<br>
><br>
> I would like some feedback on the first 2 points. Seems like there is a lot of boiler plate code I have to generate for creating 100s of record attributes. Is this the only sane way to do this? What other patterns should I consider while solving such a problem.<br>
><br>
> Also, I do not want to add too many dependencies into the project, but open to suggestions.<br>
><br>
> Any input/advice on this would be very helpful.<br>
><br>
> Thank you for the time!<br>
> Guru<br>
<br>
</div></div>The Frames package generates a vinyl record based on your data (like hlist; with a functor parameter that can be Maybe to support missing data), storing each column in a vector for very good runtime performance. As you get past 100 columns, you may encounter compile-time performance issues. If you have a sample data file you can make available, I can help diagnose performance troubles.<br>
<span class="HOEnZb"><font color="#888888"><br>
Anthony<br>
<br>
<br>
</font></span></blockquote></div><br></div>