[Haskell-cafe] Grouping - Map / Reduce

Luke Palmer lrpalmer at gmail.com
Tue Mar 24 17:51:41 EDT 2009


On Tue, Mar 24, 2009 at 3:15 PM, Gü?nther Schmidt <gue.schmidt at web.de>wrote:

> Hi,
>
> let say I got an unordered lazy list of key/value pairs like
>
> [('a', 99), ('x', 42), ('a', 33) ... ]
>
> and I need to sum up all the values with the same keys.
>
> So far I wrote a naive implementation, using Data.Map, foldl and
> insertWith.
>
> The result of this grouping operation, which is effectively another list
> of key/value pairs, just sums this time, needs to be further processed.
>
> The building of this map is of course a bottleneck, the successive
> processing needs to wait until the entire list is eventually consumed
> the Map is built and flattened again.
>
> Is there another way of doing this, something more "streaming
> architecture" like?


Yeah, make a trie.  Here's a quick example.

import Data.Monoid

newtype IntTrie a = IntTrie [a]

singleton :: (Monoid a) => Int -> a -> IntTrie a
singleton ch x = IntTrie [ if fromIntegral ch == i then x else mempty | i <-
[0..] ]

lookupTrie :: IntTrie a -> Int -> a
lookupTrie (IntTrie xs) n = xs !! n

instance (Monoid a) => Monoid (IntTrie a) where
    mempty = IntTrie (repeat mempty)
    mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend xs ys)

infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys

test = mconcat [ singleton (n `mod` 42) [n] | n <- [0..] ] `lookupTrie` 10

This is an inefficient way to find the class of n such that n mod 42 = 10.
Note that it works on an infinite list of inputs.

Here the "trie" was a simple list, but you could replace it with a more
advanced data structure for better performace.

Luke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090324/b06312ed/attachment.htm


More information about the Haskell-Cafe mailing list