[Haskell-cafe] Seen on reddit: or, foldl and foldr considered slightly harmful

Thu Feb 11 19:29:25 EST 2010

On Feb 11, 2010, at 9:41 PM, Johann Höchtl wrote:

> In a presentation of Guy Steele for ICFP 2009 in Edinburgh:
> http://www.vimeo.com/6624203
> he "considers foldl and foldr" harmful as they hinder parallelism
> because of "Process first element, then the rest" Instead he proposes
> a divide and merge aproach, especially in the light of going parallel.

I think I first heard about that issue in the 80s.

Let me just take an Erlang perspective on this for a few minutes.

Ignoring types, the classic foldl in Erlang (which is strict) is

foldl(F, A, [X|Xs]) -> foldl(F, F(X, A), Xs);
foldl(_, A, [])     -> A.

In a strict language, you have to wait for F to finish
before you can go on to the next element.
In a non-strict language, you don't.  The interesting
question is how far F can get just given X, before it
has to look at A.

Suppose we can factor F(X, A) into G(H(X), A)
where H is rather time-consuming, but G is cheap (maybe it's an add).

So we write

foldl_map(G, H, A, [X|Xs]) -> foldl_map(G, H, G(H(X), A), Xs);
foldl_map(_, _, A, [])     -> A.

Now we can parallelise it.  We'll spark off a separate thread
for each call of H.

pfoldl_map(G, H, A, Xs) ->
     Outer = self(),
     Pids = [spawn(fun () -> Outer ! {self(),H(X)} end || X <- Xs],
     foldl(G, A, [receive {Pid,Val} -> Val end | Pid <- Pids]).

If N is the length of the list and G is O(1)
we have O(N) time to traverse the list
and O(N) time to collect and fold the results.
The H calculations can take place in parallel.

Provided each H calculation is expensive enough, we may not _care_
about the O(N) bits.  In fact, if they aren't, we probably shouldn't
be worrying about parallelism here.

The problem exists when we *can't* factor F(X, A) into G(H(X), A)
with a cheap G and costly H.  Then divide-and-parallel-conquer
using k-way trees gives us a computation depth of $\log_k N$
calls of G waiting for results to come back.

As I say, I met this stuff in the 80s.  It's hardly news.

Indeed, if I remember correctly, back when Occam was hot stuff
people were worried about
	PAR i = 0 for n
	   ...
which forks n processes.  Doing that one at a time in the parent
process takes O(n) time.  I believe something was done about
making this work by recursive bisection.