[Haskell-cafe] Can you do everything without shared-memory concurrency?

Wed Sep 10 10:08:02 EDT 2008

On Wed, Sep 10, 2008 at 03:30:50PM +0200, Jed Brown wrote:
> On Wed 2008-09-10 09:05, David Roundy wrote:
> > 2008/9/9 Jed Brown <jed at 59a2.org>:
> > > On Tue 2008-09-09 12:30, Bruce Eckel wrote:
> > >> So this is the kind of problem I keep running into. There will seem to be
> > >> consensus that you can do everything with isolated processes message passing
> > >> (and note here that I include Actors in this scenario even if their mechanism
> > >> is more complex). And then someone will pipe up and say "well, of course, you
> > >> have to have threads" and the argument is usually "for efficiency."
> > >
> > > Some pipe up and say ``you can't do global shared memory because it's
> > > inefficient''.  Ensuring cache coherency with many processors operating
> > > on shared memory is a nightmare and inevitably leads to poor
> > > performance.  Perhaps some optimizations could be done if the programs
> > > were guaranteed to have no mutable state, but that's not realistic.
> > > Almost all high performance machines (think top500) are distributed
> > > memory with very few cores per node.  Parallel programs are normally
> > > written using MPI for communication and they can achieve nearly linear
> > > scaling to 10^5 processors BlueGene/L for scientific problems with
> > > strong global coupling.
> >
> > I should point out, however, that in my experience MPI programming
> > involves deadlocks and synchronization handling that are at least as
> > nasty as any I've run into doing shared-memory threading.
>
> Absolutely, avoiding deadlock is the first priority (before error
> handling).  If you use the non-blocking interface, you have to be very
> conscious of whether a buffer is being used or the call has completed.
> Regardless, the API requires the programmer to maintain a very clear
> distinction between locally owned and remote memory.

Even with the blocking interface, you had subtle bugs that I found
pretty tricky to deal with.  e.g. the reduce functions in lam3 (or was
it lam4) at one point didn't actually manage to result in the same
values on all nodes (with differences caused by roundoff error), which
led to rare deadlocks, when it so happened that two nodes disagreed as
to when a loop was completed.  Perhaps someone made the mistake of
assuming that addition was associative, or maybe it was something
triggered by the non-IEEE floating point we were using.  But in any
case, it was pretty nasty.  And it was precisely the kind of bug that
won't show up except when you're doing something like MPI where you
are pretty much forced to assume that the same (pure!) computation has
the same effect on each node.

> > This isn't an issue, of course, as long as you're letting lapack do
> > all the message passing, but once you've got to deal with message
> > passing between nodes, you've got bugs possible that are strikingly
> > similar to the sorts of nasty bugs present in shared memory threaded
> > code using locks.
>
> Lapack per-se does not do message passing.  I assume you mean whatever
> parallel library you are working with, for instance, PETSc.  Having the
> right abstractions goes a long way.

Right, I meant to say scalapack.  If you've got nice simple
abstractions (which isn't always possible), it doesn't matter if
you're using message passing or shared-memory threading.

> I'm happy to trade the issues with shared mutable state for distributed
> synchronization issues, but that is likely due to it's suitability for
> the problems I'm interested in.  If the data model maps cleanly to
> distributed memory, I think it is easier than coarse-grained shared
> parallelism.  (OpenMP is fine-grained; there is little or no shared
> mutable state and it is very easy.)

Indeed, data-parallel programming is nice and it's easy, but I'm not
sure that it maps well to most problems.  We're fortunate that it does
map well to most scientific problems, but as "normal" programmers are
thinking about parallelizing their code, I don't think data-parallel
is the paradigm that we need to lead them towards.

David