[Haskell-beginners] Simple data summarization
Andy Elvey
andy.elvey at paradise.net.nz
Wed Mar 11 01:17:42 EDT 2009
Hi Patrick!
Thanks very much - that's great!
Many thanks also to Roland and Thomas - your solutions are great too!
Although I'm still new to Haskell, I love its power and elegance. It
does have a bit of a learning curve, mainly because its *so* powerful
that its a bit like trying to fly a jumbo jet..... ;)
Only one more question. If I wanted to do a crosstab (say, with
ethnicity down the left-hand side, and gender across the top), how could
that be done?
In other words, the output would look like this -
F M
NZ European xxx xxx
NZ Maori xxx xxx
- where the x's are the totals for each category (NZ European F), (NZ
European M), (NZ Maori, F), NZ Maori, M).
I think it would involve zipWith and Array, but beyond that, I find it
hard to think this through in two dimensions.... :)
Crosstab code would be *great* to play around with!
Thanks again - bye for now -
- Andy
Patrick LeBoutillier wrote:
> Andy,
>
> I came up with this solution that works like you described:
>
>
> import Data.List.Split
>
> mysplit = wordsBy (==',')
>
> toPairs :: [String] -> [(String, [String])]
> toPairs (header:rows) = foldr f (initPairs header) $ splitRows rows
> where f row acc = zipWith (\f (h,r) -> (h,f:r)) row acc
> initPairs header = map (\h -> (h, [])) $ mysplit header
> splitRows rows = map (mysplit) rows
>
> summarizeByWith :: String -> (Int -> Int -> Int) -> [(String,
> [String])] -> (String, Int)
> summarizeByWith var agg pairs = case (lookup var pairs) of
> Just vals -> (var, foldl agg 0 $ map (read) vals)
> otherwise -> ("", 0)
>
> main = interact (show . summarizeByWith "Books" (+) . toPairs . lines)
>
>
> However in my opinion a solution like that proposed by Roland is
> preferable since it can process the input line by line instead of
> storing it all in memory. It seems also simpler and propably more
> efficient.
>
> However it was interesting hacking at your algorithm because it made
> me realize how you can use lists of pairs (association lists) in
> haskell where you might have used hash tables in another language.
>
>
> Cheers,
>
> Patrick
>
>
>
> On Tue, Mar 10, 2009 at 4:33 AM, Andy Elvey <andy.elvey at paradise.net.nz> wrote:
>
>> Hi all -
>> In the process of learning Haskell I'm wanting to do some simple data
>> summarization.
>> ( Btw, I'm looking at putting any submitted code for this in the "cookbook"
>> section of
>> the Haskell wiki. Imo it would be very useful there as a "next step" up
>> from just reading
>> in a file and printing it out. )
>> This would involve reading in a delimited file like this - ( just a
>> contrived example of how many books
>> some people own ) -
>>
>> Name,Gender,Age,Ethnicity,Books
>> Mary,F,14,NZ European, 11
>> Brian,M,13,NZ European, 6
>> Josh,M,12,NZ European, 14
>> Regan,M,14,NZ Maori, 9
>> Helen,F,15,NZ Maori, 17
>> Anna,F,14,NZ European, 16
>> Jess,F,14,NZ Maori, 21
>>
>> .... and doing some operations on it. As you can see, the file has column
>> headings - I prefer to be able to manipulate data with
>> headings (as it is what I do a lot of at work, using another programming
>> language).
>>
>> I've tried to break the problem down into small parts as follows. a) Read
>> the file into a list of pairs.
>> The first element of the pair would be the column heading.
>> The second will be a list containing the data.
>> For example, ("Name", [Mary, Brian, Josh, Regan, ..... ] )
>> b) Select a numeric variable to summarise ( "Books" in this example) c) Do a
>> fold to summarize the variable. I think a left-fold would be the one to use
>> here, but I may
>> be wrong....
>>
>> After looking through previous postings on this list, I found some code
>> which is somewhat similar to what I'm after (although the data it was
>> crunching is very different). This is what I've come up with so far -
>>
>> summarize [] = []
>> summarize ls = let
>> byvariable = head ls
>> numeric_variable = last ls
>> sum = foldl (+) 0 $ numeric_variable
>>
>> in (byvariable, sum) : sum ls
>>
>> main = interact (unlines . map show . summarize . lines)
>> I think this might be a useful start, but I still need to read the data into
>> a list of pairs as mentioned, and I'm unsure as to how to
>> do that.
>> Many thanks in advance for any help received. As mentioned, I'm sure that
>> examples like this could be very useful to other beginners, so I'm keen to
>> make sure that any help given is made maximum use of (by putting any code on
>> the Haskell wiki). - Andy
>>
>> _______________________________________________
>> Beginners mailing list
>> Beginners at haskell.org
>> http://www.haskell.org/mailman/listinfo/beginners
>>
>>
>
>
>
>
More information about the Beginners
mailing list