[Haskell-cafe] monoid fold concurrently

Fri Nov 15 21:12:35 UTC 2019

On Fri, Nov 15, 2019 at 08:20:44PM +0000, PICCA Frederic-Emmanuel wrote:

> > Is the list is guaranteed non-empty?  Do you want to enforce that at the type
> > level (compile time), or "fail" at runtime when it is?

?

> > You should probably be a bit more explicit about what you mean by
> > "concurrently".  Do you know in advance that the list length is sufficiently
> > short to make it reasonable to immediately fork an async thread for each?

?

> > Also, do you want the outputs to folded in list order, or in any order (e.g.
> > roughly in order of IO action completion)?
> 
> The real probleme, is:
> 
> I have a bunch of hdf5 files which contain a stack of image and other metadata's.

Is that at least one? O(1) per core, ... hundreds, tens of thousands? 

> for each image an associated metadatas, I can create a cube (3D array), whcih is the binning  in the 3D space

Do you have space constraints on the number of not yet folded
together cubes that can be in memory at the same time?

> binning -> binning -> binning (this is the monoid), since this is pure
> computation, I can use unsafe IO to create the merge function.

Is it actually a Monoid (has an identity), or only a Semigroup?

> In my case, I want to distribute all this on all my core.

Which is the (more) expensive operation, computing a cube or merging
two already computed cubes?

> each core can do in which ever order a merge of the binning until I have only one binning.
> 
> [a1, a2, a3,  a4]
> 
> core1: a1 + a2 -> a12
> core2: a3 + a4 -> a34
> 
> then
> 
> first core available, a12 + a34 -> a1234

That is still ultimately order preserving (but associative): (a1 + a2) + (a3 + a4).
Is the semigroup also commutative, would:  (a2 + a4) + (a1 + a3) also work?

-- 
    Viktor.