[Haskell-cafe] Can you do everything without shared-memory concurrency?

Tue Sep 9 06:03:10 EDT 2008

On Mon, Sep 8, 2008 at 8:33 PM, Bruce Eckel <bruceteckel at gmail.com> wrote:

> As some of you on this list may know, I have struggled to understand
> concurrency, on and off for many years, but primarily in the C++ and
> Java domains. As time has passed and experience has stacked up, I have
> become more convinced that while the world runs in parallel, we think
> sequentially and so shared-memory concurrency is impossible for
> programmers to get right -- not only are we unable to think in such a
> way to solve the problem, the unnatural domain-cutting that happens in
> shared-memory concurrency always trips you up, especially when the
> scale increases.
>
> I think that the inclusion of threads and locks in Java was just a
> knee-jerk response to solving the concurrency problem. Indeed, there
> were subtle threading bugs in the system until Java 5. I personally
> find the Actor model to be most attractive when talking about
> threading and objects, but I don't yet know where the limitations of
> Actors are.
>
> However, I keep running across comments where people claim they "must"
> have shared memory concurrency. It's very hard for me to tell whether
> this is just because the person knows threads or if there is truth to
> it.

For correctness, maybe not, for efficiency, yes definitely!

Imagine a program where you have a huge set of data that needs to be
modified (in some sense) over time by thousands of agents. E.g. a game
simulation.
Now, also imagine that every agent could *potentially* modify every single
piece of data, but that every agent *typically* only touches two or three
varibles here and there. I.e. the collisions between the potential
read/write sets is 100%, while the collisions for the actual read/write sets
is very very low.

How would you do this with threads and message passing? Well you could have
one big thread owning all of your data that takes "update" messages, and
then "updates" the world for you (immutably if you wish, by just replacing
its "world" variable with a new one containing your update), but now you've
effectively serialized all your interactions with the "world", so you're not
really concurrent anymore!

So you could decompose the world into multiple threads using some
application-specific logical sudivision, but then you're effectively just
treating each thread as a mutable variable with an implicit lock (with the
risks of deadlock that comes with it - remember we don't know the read/write
set in advance - it could be the entire world - so we can't just order our
updates in some global way here), so you're really just doing shared mutable
state again, and gain little from having threads "simulate" your mutable
cells...

What you really need for this is some way for each agent to update this
shared state *in parallel*, without having to block all other agents
pessimistically, but instead only block other agents if there was an
*actual* conflict. STM seems to be the only real hope for that sort of thing
right now.
IMO my list of preferred methods goes like this:
1. Purely functional data parallelism
2. Purely functional task parallelism (using e.g. strategies)
3. Message passing with no (or very minimal) shared state (simulated using
threads as "data servers" or otherwise)
(3.5. Join patterns? Don't have enough experience with this, but seems sort
of nice?)
4. Shared state concurrency using STM
5. Shared state concurrency using locks
6. Lockless programming.

So while I wouldn't resort to any shared state concurrency unless there are
good reasons for why the other methods don't work well (performance is a
good reason!), there are still situations where you need it, and a general
purpose language had better supply a way of accessing those kinds of
facilities.

-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20080909/5939d48d/attachment.htm