[Haskell-cafe] Re: Can you do everything without shared-memory
wnoise at ofb.net
Thu Sep 11 22:17:35 EDT 2008
On 2008-09-10, David Roundy <droundy at darcs.net> wrote:
> On Wed, Sep 10, 2008 at 03:30:50PM +0200, Jed Brown wrote:
>> On Wed 2008-09-10 09:05, David Roundy wrote:
>> > I should point out, however, that in my experience MPI programming
>> > involves deadlocks and synchronization handling that are at least as
>> > nasty as any I've run into doing shared-memory threading.
>> Absolutely, avoiding deadlock is the first priority (before error
>> handling). If you use the non-blocking interface, you have to be very
>> conscious of whether a buffer is being used or the call has completed.
>> Regardless, the API requires the programmer to maintain a very clear
>> distinction between locally owned and remote memory.
> Even with the blocking interface, you had subtle bugs that I found
> pretty tricky to deal with. e.g. the reduce functions in lam3 (or was
> it lam4) at one point didn't actually manage to result in the same
> values on all nodes (with differences caused by roundoff error), which
> led to rare deadlocks, when it so happened that two nodes disagreed as
> to when a loop was completed. Perhaps someone made the mistake of
> assuming that addition was associative, or maybe it was something
> triggered by the non-IEEE floating point we were using. But in any
> case, it was pretty nasty. And it was precisely the kind of bug that
> won't show up except when you're doing something like MPI where you
> are pretty much forced to assume that the same (pure!) computation has
> the same effect on each node.
Ah, okay. I think that's a real edge case, and probably not how most
use MPI. I've used both threads and MPI; MPI, while cumbersome, never
gave me any hard-to-debug deadlock problems.
More information about the Haskell-Cafe