[Haskell-cafe] Re: Grouping - Map / Reduce
Gü?nther Schmidt
gue.schmidt at web.de
Tue Mar 24 20:20:47 EDT 2009
Dear Luke,
I suspect Black Magic at work here.
This seems to work and I so don't have a clue why. But thank you very
much nevertheless, I strongly suspect that, once I figured out why this
works, I will have learned a very, very important trick indeed.
Had I read "purely functional data structures" from start to finish,
would I have come across this?
Günther
Luke Palmer schrieb:
> On Tue, Mar 24, 2009 at 3:51 PM, Luke Palmer <lrpalmer at gmail.com
> <mailto:lrpalmer at gmail.com>> wrote:
>
> On Tue, Mar 24, 2009 at 3:15 PM, Gü?nther Schmidt
> <gue.schmidt at web.de <mailto:gue.schmidt at web.de>> wrote:
>
> Hi,
>
> let say I got an unordered lazy list of key/value pairs like
>
> [('a', 99), ('x', 42), ('a', 33) ... ]
>
> and I need to sum up all the values with the same keys.
>
> So far I wrote a naive implementation, using Data.Map, foldl and
> insertWith..
>
> The result of this grouping operation, which is effectively
> another list
> of key/value pairs, just sums this time, needs to be further
> processed.
>
> The building of this map is of course a bottleneck, the successive
> processing needs to wait until the entire list is eventually
> consumed
> the Map is built and flattened again.
>
> Is there another way of doing this, something more "streaming
> architecture" like?
>
>
> Yeah, make a trie. Here's a quick example.
>
> import Data.Monoid
>
> newtype IntTrie a = IntTrie [a]
>
> singleton :: (Monoid a) => Int -> a -> IntTrie a
> singleton ch x = IntTrie [ if fromIntegral ch == i then x else
> mempty | i <- [0..] ]
>
>
> This definition of singleton unnecessarily leaks memory in some cases.
> Here's a better one:
>
> singleton ch x = IntTrie $ replicate ch mempty ++ [x] ++ repeat mempty
>
> Luke
>
>
>
> lookupTrie :: IntTrie a -> Int -> a
> lookupTrie (IntTrie xs) n = xs !! n
>
> instance (Monoid a) => Monoid (IntTrie a) where
> mempty = IntTrie (repeat mempty)
> mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend
> xs ys)
>
> infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
>
> test = mconcat [ singleton (n `mod` 42) [n] | n <- [0..] ]
> `lookupTrie` 10
>
> This is an inefficient way to find the class of n such that n mod 42
> = 10. Note that it works on an infinite list of inputs.
>
> Here the "trie" was a simple list, but you could replace it with a
> more advanced data structure for better performace.
>
> Luke
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list