[Haskell-cafe] Patterns for processing large but finite streams

Alexey Khudyakov alexey.skladnoy at gmail.com
Fri Jul 1 11:29:41 CEST 2011


On Fri, Jul 1, 2011 at 12:54 PM, Eugene Kirpichov <ekirpichov at gmail.com> wrote:
> Alexey, your definition of "mean" does not look like "liftS2 (/) sum
> length" - you have to manually "fuse" these computations.
>
Well it was fused for numerical stability

> I'm asking for a formalism that does this fusion automatically (and
> guaranteedly).
>
Joining accumulators is quite straightforward. So is joining of initial
state. Just creating a
> joinAcc :: (acc1 -> x -> acc1) -> (acc2 -> x -> acc2) -> (acc1,acc2) -> x -> (acc1,acc2)
> joinAcc f1 f2 (s1,s2) x = (f1 s1 x, f2 s2 x)

Still you have to handle them separately.
> sum' = foldl (+) 0
> len  = foldl (\n _ -> n+1) 0
> sumLen = foldl (joinAcc (+) (\n _ -> n+1)) (0,0)

There is more regular approach but it only works with statistics.
(function which do not depend on order of elements in the sample)
For every statistics monoid for its evaluation could be constructed.
For example sum:
> newtype Sum a = Sum a
> instance Num a => Monoid (Sum a) where
>   mempty = Sum 0
>   mappend (Sum a) (Sum b) = Sum (a+b)

Composition of these monoids becomes trivial. Just use


I pursued this approach in monoid-statistics[1] package.
It's reasonably well documented

 [1] http://hackage.haskell.org/package/monoid-statistics



More information about the Haskell-Cafe mailing list