[Haskell-cafe] Grouping - Map / Reduce
Luke Palmer
lrpalmer at gmail.com
Tue Mar 24 17:51:41 EDT 2009
On Tue, Mar 24, 2009 at 3:15 PM, Gü?nther Schmidt <gue.schmidt at web.de>wrote:
> Hi,
>
> let say I got an unordered lazy list of key/value pairs like
>
> [('a', 99), ('x', 42), ('a', 33) ... ]
>
> and I need to sum up all the values with the same keys.
>
> So far I wrote a naive implementation, using Data.Map, foldl and
> insertWith.
>
> The result of this grouping operation, which is effectively another list
> of key/value pairs, just sums this time, needs to be further processed.
>
> The building of this map is of course a bottleneck, the successive
> processing needs to wait until the entire list is eventually consumed
> the Map is built and flattened again.
>
> Is there another way of doing this, something more "streaming
> architecture" like?
Yeah, make a trie. Here's a quick example.
import Data.Monoid
newtype IntTrie a = IntTrie [a]
singleton :: (Monoid a) => Int -> a -> IntTrie a
singleton ch x = IntTrie [ if fromIntegral ch == i then x else mempty | i <-
[0..] ]
lookupTrie :: IntTrie a -> Int -> a
lookupTrie (IntTrie xs) n = xs !! n
instance (Monoid a) => Monoid (IntTrie a) where
mempty = IntTrie (repeat mempty)
mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend xs ys)
infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
test = mconcat [ singleton (n `mod` 42) [n] | n <- [0..] ] `lookupTrie` 10
This is an inefficient way to find the class of n such that n mod 42 = 10.
Note that it works on an infinite list of inputs.
Here the "trie" was a simple list, but you could replace it with a more
advanced data structure for better performace.
Luke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090324/b06312ed/attachment.htm
More information about the Haskell-Cafe
mailing list