[Haskell-cafe] Grouping - Map / Reduce

Luke Palmer lrpalmer at gmail.com
Tue Mar 24 18:00:10 EDT 2009


On Tue, Mar 24, 2009 at 3:51 PM, Luke Palmer <lrpalmer at gmail.com> wrote:

> On Tue, Mar 24, 2009 at 3:15 PM, Gü?nther Schmidt <gue.schmidt at web.de>wrote:
>
>> Hi,
>>
>> let say I got an unordered lazy list of key/value pairs like
>>
>> [('a', 99), ('x', 42), ('a', 33) ... ]
>>
>> and I need to sum up all the values with the same keys.
>>
>> So far I wrote a naive implementation, using Data.Map, foldl and
>> insertWith.
>>
>> The result of this grouping operation, which is effectively another list
>> of key/value pairs, just sums this time, needs to be further processed.
>>
>> The building of this map is of course a bottleneck, the successive
>> processing needs to wait until the entire list is eventually consumed
>> the Map is built and flattened again.
>>
>> Is there another way of doing this, something more "streaming
>> architecture" like?
>
>
> Yeah, make a trie.  Here's a quick example.
>
> import Data.Monoid
>
> newtype IntTrie a = IntTrie [a]
>
> singleton :: (Monoid a) => Int -> a -> IntTrie a
> singleton ch x = IntTrie [ if fromIntegral ch == i then x else mempty | i
> <- [0..] ]
>

This definition of singleton unnecessarily leaks memory in some cases.
Here's a better one:

singleton ch x = IntTrie $ replicate ch mempty ++ [x] ++ repeat mempty

Luke


>
> lookupTrie :: IntTrie a -> Int -> a
> lookupTrie (IntTrie xs) n = xs !! n
>
> instance (Monoid a) => Monoid (IntTrie a) where
>     mempty = IntTrie (repeat mempty)
>     mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend xs ys)
>
> infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
>
> test = mconcat [ singleton (n `mod` 42) [n] | n <- [0..] ] `lookupTrie` 10
>
> This is an inefficient way to find the class of n such that n mod 42 = 10.
> Note that it works on an infinite list of inputs.
>
> Here the "trie" was a simple list, but you could replace it with a more
> advanced data structure for better performace.
>
> Luke
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090324/6d175c9e/attachment.htm


More information about the Haskell-Cafe mailing list