[Haskell-beginners] Simple data summarization

Tue Mar 10 04:33:00 EDT 2009

Hi all - 

In the process of learning Haskell I'm wanting to do some simple data 
summarization.
( Btw, I'm looking at putting any submitted code for this in the 
"cookbook" section of
the Haskell wiki.  Imo it would be very useful there as a "next step" up 
from just reading
in a file and printing it out.  ) 

This would involve reading in a delimited file like this - ( just a 
contrived example of how many books
some people own ) -

Name,Gender,Age,Ethnicity,Books
Mary,F,14,NZ European, 11
Brian,M,13,NZ European, 6
Josh,M,12,NZ European, 14
Regan,M,14,NZ Maori, 9
Helen,F,15,NZ Maori, 17
Anna,F,14,NZ European, 16
Jess,F,14,NZ Maori, 21

.... and doing some operations on it. 
As you can see, the file has column headings - I prefer to be able to 
manipulate data with
headings (as it is what I do a lot of at work, using another programming 
language).

I've tried to break the problem down into small parts as follows. 
a) Read the file into a list of pairs.
The first element of the pair would be the column heading.
The second will be a list containing the data.
For example, ("Name",  [Mary,  Brian,  Josh,  Regan, ..... ]  )   

b) Select a numeric variable to summarise ( "Books" in this example) 
c) Do a fold to summarize the variable. I think a left-fold would be the 
one to use here, but I may
be wrong....

After looking through previous postings on this list, I found some code 
which is somewhat similar to what I'm after (although the data it was 
crunching is very different).  This is what I've come up with so far -

summarize [] = []
summarize ls = let
        byvariable = head ls
        numeric_variable = last ls
        sum = foldl (+) 0 $ numeric_variable

    in (byvariable, sum) : sum ls

main = interact (unlines . map show . summarize . lines) 

I think this might be a useful start, but I still need to read the data 
into a list of pairs as mentioned, and I'm unsure as to how to
do that. 

Many thanks in advance for any help received.  As mentioned, I'm sure 
that examples like this could be very useful to other beginners, so I'm 
keen to make sure that any help given is made maximum use of (by putting 
any code on the Haskell wiki). 
- Andy