[Haskell-cafe] Seen on reddit: or,
foldl and foldr considered slightly harmful
Richard O'Keefe
ok at cs.otago.ac.nz
Thu Feb 11 19:29:25 EST 2010
On Feb 11, 2010, at 9:41 PM, Johann Höchtl wrote:
> In a presentation of Guy Steele for ICFP 2009 in Edinburgh:
> http://www.vimeo.com/6624203
> he "considers foldl and foldr" harmful as they hinder parallelism
> because of "Process first element, then the rest" Instead he proposes
> a divide and merge aproach, especially in the light of going parallel.
I think I first heard about that issue in the 80s.
Let me just take an Erlang perspective on this for a few minutes.
Ignoring types, the classic foldl in Erlang (which is strict) is
foldl(F, A, [X|Xs]) -> foldl(F, F(X, A), Xs);
foldl(_, A, []) -> A.
In a strict language, you have to wait for F to finish
before you can go on to the next element.
In a non-strict language, you don't. The interesting
question is how far F can get just given X, before it
has to look at A.
Suppose we can factor F(X, A) into G(H(X), A)
where H is rather time-consuming, but G is cheap (maybe it's an add).
So we write
foldl_map(G, H, A, [X|Xs]) -> foldl_map(G, H, G(H(X), A), Xs);
foldl_map(_, _, A, []) -> A.
Now we can parallelise it. We'll spark off a separate thread
for each call of H.
pfoldl_map(G, H, A, Xs) ->
Outer = self(),
Pids = [spawn(fun () -> Outer ! {self(),H(X)} end || X <- Xs],
foldl(G, A, [receive {Pid,Val} -> Val end | Pid <- Pids]).
If N is the length of the list and G is O(1)
we have O(N) time to traverse the list
and O(N) time to collect and fold the results.
The H calculations can take place in parallel.
Provided each H calculation is expensive enough, we may not _care_
about the O(N) bits. In fact, if they aren't, we probably shouldn't
be worrying about parallelism here.
The problem exists when we *can't* factor F(X, A) into G(H(X), A)
with a cheap G and costly H. Then divide-and-parallel-conquer
using k-way trees gives us a computation depth of $\log_k N$
calls of G waiting for results to come back.
As I say, I met this stuff in the 80s. It's hardly news.
Indeed, if I remember correctly, back when Occam was hot stuff
people were worried about
PAR i = 0 for n
...
which forks n processes. Doing that one at a time in the parent
process takes O(n) time. I believe something was done about
making this work by recursive bisection.
More information about the Haskell-Cafe
mailing list