[Haskell-cafe] Why isn't there a cheaper "split-in-two" operation for Data.Set?

wren ng thornton wren at freegeek.org
Tue Oct 19 23:05:45 EDT 2010


On 10/19/10 5:47 AM, Ryan Newton wrote:
> That sounds good to me.  In any case the parallel map/fold operations by
> themselves shouldn't compromise the abstraction.
>
> Perhaps an eventual solution would be to start including parallel maps/folds
> right inside the standard libraries.  I haven't began testing this yet but
> it would seem that all the balanced tree implementations are good candidates
> for a little `par` treatment.  Has this been tried somewhere already?
>     Alas, having par/nonpar versions of library functions compounds the
> already present monadic/non-monadic schism...

Another problem is that the desirable level of parallelization isn't 
fixed. For example, let's consider a function f being mapped over a 
collection C.

With non-parallel map this has cost O(0*k) + O(C)*O(f) where k is the 
cost of spawning/reaping threads, O(C) is the size of the collection, 
and O(f) is the cost of running f once.

Now, consider mapPar2 which uses two threads. The cost of this is O(1*k) 
+ 2 `par` O(C)/2*O(f), where `par` is a kind of multiplication based on 
the level of parallelism actually achieved. With perfect paralellism 
x`par`y = y; with no parallelism x`par`y = x*y.

We can generalize this to O((n-1)*k) + n `par` O(C)/n*O(f) for n 
threads. But the problem is, depending on how big O(f) is relative to 
O(k), there's going to be a different n which gives the optimal 
tradeoff. If O(f) is small enough, then the overhead of sparking threads 
is wasted; if O(f) is middling, then we'd want a few threads but not 
"too many"; if O(f) is huge, then we want n = O(C) since the overhead 
disappears in the asymptotics. Thus, what we'd need to offer in the API 
is a set of evaluators in order to alter the level of parallelization.

-- 
Live well,
~wren


More information about the Haskell-Cafe mailing list