# [Haskell-cafe] Why isn't there a cheaper "split-in-two" operation for Data.Set?

Wed Oct 20 06:53:26 EDT 2010

```On 20 October 2010 04:05, wren ng thornton <wren at freegeek.org> wrote:
> On 10/19/10 5:47 AM, Ryan Newton wrote:
>>
>> That sounds good to me.  In any case the parallel map/fold operations by
>> themselves shouldn't compromise the abstraction.
>>
>> Perhaps an eventual solution would be to start including parallel
>> maps/folds
>> right inside the standard libraries.  I haven't began testing this yet but
>> it would seem that all the balanced tree implementations are good
>> candidates
>> for a little `par` treatment.  Has this been tried somewhere already?
>>    Alas, having par/nonpar versions of library functions compounds the
>
> Another problem is that the desirable level of parallelization isn't fixed.
> For example, let's consider a function f being mapped over a collection C.
>
> With non-parallel map this has cost O(0*k) + O(C)*O(f) where k is the cost
> of spawning/reaping threads, O(C) is the size of the collection, and O(f) is
> the cost of running f once.
>
> Now, consider mapPar2 which uses two threads. The cost of this is O(1*k) + 2
> `par` O(C)/2*O(f), where `par` is a kind of multiplication based on the
> level of parallelism actually achieved. With perfect paralellism x`par`y =
> y; with no parallelism x`par`y = x*y.

You probably mean x + y.

>
> We can generalize this to O((n-1)*k) + n `par` O(C)/n*O(f) for n threads.
> But the problem is, depending on how big O(f) is relative to O(k), there's
> going to be a different n which gives the optimal tradeoff. If O(f) is small
> enough, then the overhead of sparking threads is wasted; if O(f) is
> middling, then we'd want a few threads but not "too many"; if O(f) is huge,
> then we want n = O(C) since the overhead disappears in the asymptotics.
> Thus, what we'd need to offer in the API is a set of evaluators in order to
> alter the level of parallelization.

Actually, it doesn't work that way anymore. Threads are not (or only