[Haskell-cafe] Why isn't there a cheaper "split-in-two" operation for Data.Set?

Wed Oct 20 06:53:26 EDT 2010

On 20 October 2010 04:05, wren ng thornton <wren at freegeek.org> wrote:
> On 10/19/10 5:47 AM, Ryan Newton wrote:
>>
>> That sounds good to me.  In any case the parallel map/fold operations by
>> themselves shouldn't compromise the abstraction.
>>
>> Perhaps an eventual solution would be to start including parallel
>> maps/folds
>> right inside the standard libraries.  I haven't began testing this yet but
>> it would seem that all the balanced tree implementations are good
>> candidates
>> for a little `par` treatment.  Has this been tried somewhere already?
>>    Alas, having par/nonpar versions of library functions compounds the
>> already present monadic/non-monadic schism...
>
> Another problem is that the desirable level of parallelization isn't fixed.
> For example, let's consider a function f being mapped over a collection C.
>
> With non-parallel map this has cost O(0*k) + O(C)*O(f) where k is the cost
> of spawning/reaping threads, O(C) is the size of the collection, and O(f) is
> the cost of running f once.
>
> Now, consider mapPar2 which uses two threads. The cost of this is O(1*k) + 2
> `par` O(C)/2*O(f), where `par` is a kind of multiplication based on the
> level of parallelism actually achieved. With perfect paralellism x`par`y =
> y; with no parallelism x`par`y = x*y.

You probably mean x + y.

>
> We can generalize this to O((n-1)*k) + n `par` O(C)/n*O(f) for n threads.
> But the problem is, depending on how big O(f) is relative to O(k), there's
> going to be a different n which gives the optimal tradeoff. If O(f) is small
> enough, then the overhead of sparking threads is wasted; if O(f) is
> middling, then we'd want a few threads but not "too many"; if O(f) is huge,
> then we want n = O(C) since the overhead disappears in the asymptotics.
> Thus, what we'd need to offer in the API is a set of evaluators in order to
> alter the level of parallelization.

Actually, it doesn't work that way anymore. Threads are not (or only
rarely) created to run sparks.  Instead, idle threads look into the
spark pool and grab and execute some sparks. Nowadays, the main
overhead you get is from stealing sparks from other CPUs which will
incur a lot of cache misses.  If the other CPU is busy it may still be
worth it, but it's quite hard to predict.  Sparks are also run in
chunks which makes it less expensive to have small sparks.

-- 
Push the envelope. Watch it bend.