[Haskell-beginners] Simple data summarization

Andy Elvey andy.elvey at paradise.net.nz
Wed Mar 11 01:17:42 EDT 2009


Hi Patrick! 

Thanks very much - that's great!  
Many thanks also to Roland and Thomas - your solutions are great too! 

Although I'm still new to Haskell, I love its power and elegance.  It 
does have a bit of a learning curve, mainly because its *so* powerful 
that its a bit like trying to fly a jumbo jet..... ;)  

Only one more question. If I wanted to do a crosstab (say, with 
ethnicity down the left-hand side, and gender across the top), how could 
that be done?  
In other words, the output would look like this -

                         F      M 
NZ European    xxx   xxx
NZ Maori          xxx   xxx 

- where the x's are the totals for each category (NZ European F), (NZ 
European M), (NZ Maori, F), NZ Maori, M). 
I think it would involve zipWith and Array, but beyond that, I find it 
hard to think this through in two dimensions.... :) 
Crosstab code would be *great* to play around with!

Thanks again - bye for now -
 - Andy


Patrick LeBoutillier wrote:
> Andy,
>
> I came up with this solution that works like you described:
>
>
> import Data.List.Split
>
> mysplit = wordsBy (==',')
>
> toPairs :: [String] -> [(String, [String])]
> toPairs (header:rows) = foldr f (initPairs header) $ splitRows rows
>     where f row acc = zipWith (\f (h,r) -> (h,f:r)) row acc
>           initPairs header = map (\h -> (h, [])) $ mysplit header
>           splitRows rows = map (mysplit) rows
>
> summarizeByWith :: String -> (Int -> Int -> Int) -> [(String,
> [String])] -> (String, Int)
> summarizeByWith var agg pairs = case (lookup var pairs) of
>     Just vals -> (var, foldl agg 0 $ map (read) vals)
>     otherwise -> ("", 0)
>
> main = interact (show . summarizeByWith "Books" (+) . toPairs . lines)
>
>
> However in my opinion a solution like that proposed by Roland is
> preferable since it can process the input line by line instead of
> storing it all in memory. It seems also simpler and propably more
> efficient.
>
> However it was interesting hacking at your algorithm because it made
> me realize how you can use lists of pairs (association lists) in
> haskell where you might have used hash tables in another language.
>
>
> Cheers,
>
> Patrick
>
>
>
> On Tue, Mar 10, 2009 at 4:33 AM, Andy Elvey <andy.elvey at paradise.net.nz> wrote:
>   
>> Hi all -
>> In the process of learning Haskell I'm wanting to do some simple data
>> summarization.
>> ( Btw, I'm looking at putting any submitted code for this in the "cookbook"
>> section of
>> the Haskell wiki.  Imo it would be very useful there as a "next step" up
>> from just reading
>> in a file and printing it out.  )
>> This would involve reading in a delimited file like this - ( just a
>> contrived example of how many books
>> some people own ) -
>>
>> Name,Gender,Age,Ethnicity,Books
>> Mary,F,14,NZ European, 11
>> Brian,M,13,NZ European, 6
>> Josh,M,12,NZ European, 14
>> Regan,M,14,NZ Maori, 9
>> Helen,F,15,NZ Maori, 17
>> Anna,F,14,NZ European, 16
>> Jess,F,14,NZ Maori, 21
>>
>> .... and doing some operations on it. As you can see, the file has column
>> headings - I prefer to be able to manipulate data with
>> headings (as it is what I do a lot of at work, using another programming
>> language).
>>
>> I've tried to break the problem down into small parts as follows. a) Read
>> the file into a list of pairs.
>> The first element of the pair would be the column heading.
>> The second will be a list containing the data.
>> For example, ("Name",  [Mary,  Brian,  Josh,  Regan, ..... ]  )
>> b) Select a numeric variable to summarise ( "Books" in this example) c) Do a
>> fold to summarize the variable. I think a left-fold would be the one to use
>> here, but I may
>> be wrong....
>>
>> After looking through previous postings on this list, I found some code
>> which is somewhat similar to what I'm after (although the data it was
>> crunching is very different).  This is what I've come up with so far -
>>
>> summarize [] = []
>> summarize ls = let
>>       byvariable = head ls
>>       numeric_variable = last ls
>>       sum = foldl (+) 0 $ numeric_variable
>>
>>   in (byvariable, sum) : sum ls
>>
>> main = interact (unlines . map show . summarize . lines)
>> I think this might be a useful start, but I still need to read the data into
>> a list of pairs as mentioned, and I'm unsure as to how to
>> do that.
>> Many thanks in advance for any help received.  As mentioned, I'm sure that
>> examples like this could be very useful to other beginners, so I'm keen to
>> make sure that any help given is made maximum use of (by putting any code on
>> the Haskell wiki). - Andy
>>
>> _______________________________________________
>> Beginners mailing list
>> Beginners at haskell.org
>> http://www.haskell.org/mailman/listinfo/beginners
>>
>>     
>
>
>
>   



More information about the Beginners mailing list