# [Haskell-cafe] Re: Grouping - Map / Reduce

Gü?nther Schmidt gue.schmidt at web.de
Tue Mar 24 20:20:47 EDT 2009

```Dear Luke,

I suspect Black Magic at work here.

This seems to work and I so don't have a clue why. But thank you very
much nevertheless, I strongly suspect that, once I figured out why this
works, I will have learned a very, very important trick indeed.

would I have come across this?

Günther

Luke Palmer schrieb:
> On Tue, Mar 24, 2009 at 3:51 PM, Luke Palmer <lrpalmer at gmail.com
> <mailto:lrpalmer at gmail.com>> wrote:
>
>     On Tue, Mar 24, 2009 at 3:15 PM, Gü?nther Schmidt
>     <gue.schmidt at web.de <mailto:gue.schmidt at web.de>> wrote:
>
>         Hi,
>
>         let say I got an unordered lazy list of key/value pairs like
>
>         [('a', 99), ('x', 42), ('a', 33) ... ]
>
>         and I need to sum up all the values with the same keys.
>
>         So far I wrote a naive implementation, using Data.Map, foldl and
>         insertWith..
>
>         The result of this grouping operation, which is effectively
>         another list
>         of key/value pairs, just sums this time, needs to be further
>         processed.
>
>         The building of this map is of course a bottleneck, the successive
>         processing needs to wait until the entire list is eventually
>         consumed
>         the Map is built and flattened again.
>
>         Is there another way of doing this, something more "streaming
>         architecture" like?
>
>
>     Yeah, make a trie.  Here's a quick example.
>
>     import Data.Monoid
>
>     newtype IntTrie a = IntTrie [a]
>
>     singleton :: (Monoid a) => Int -> a -> IntTrie a
>     singleton ch x = IntTrie [ if fromIntegral ch == i then x else
>     mempty | i <- [0..] ]
>
>
> This definition of singleton unnecessarily leaks memory in some cases.
> Here's a better one:
>
> singleton ch x = IntTrie \$ replicate ch mempty ++ [x] ++ repeat mempty
>
> Luke
>
>
>
>     lookupTrie :: IntTrie a -> Int -> a
>     lookupTrie (IntTrie xs) n = xs !! n
>
>     instance (Monoid a) => Monoid (IntTrie a) where
>         mempty = IntTrie (repeat mempty)
>         mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend
>     xs ys)
>
>     infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
>
>     test = mconcat [ singleton (n `mod` 42) [n] | n <- [0..] ]
>     `lookupTrie` 10
>
>     This is an inefficient way to find the class of n such that n mod 42
>     = 10.  Note that it works on an infinite list of inputs.
>
>     Here the "trie" was a simple list, but you could replace it with a
>     more advanced data structure for better performace.
>
>     Luke
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________