Threads vs. processes [Was: Re: [Haskell-cafe] Re: Python's big challenges, Haskell's big advantages?]

Wed Sep 17 17:50:55 EDT 2008

On Wed, 2008-09-17 at 21:20 +0000, Aaron Denney wrote:
> On 2008-09-17, Jonathan Cast <jonathanccast at fastmail.fm> wrote:
> >> In my mind pooling vs new-creation is only relevant to process vs
> >> thread in the performance aspects.
> >
> > Say what?  This discussion is entirely about performance --- does
> > CPython actually have the ability to scale concurrent programs to
> > multiple processors?  The only reason you would ever want to do that is
> > for performance.
> 
> I entered the discussion as which model is a workaround for the other --

Well, I thought the discussion was about implementations, not models.  I
also assumed remarks would be made in the context of the entire thread.
I shall have to remember that in the future.

> someone said processes were a workaround for the lack of good threading
> in e.g. standard CPython.

> I replied that most languages thread support

Using a definition of `thread' which, apparantly, excludes Concurrent
Haskell.

> can be
> seen as a workaround for the poor performance of communicating processes.

Meaning kernel-switched processes.

> (creation in particular is usually cited, but that cost can often be reduced
> by process pools, context switching costs, alas, is harder.)
> 
> > Kernel threads /are/ expensive.  Which is why all the cool kids use
> > user-space threads.
> 
> Often muxed on top of kernel threads, because user-threads can't use
> multiple CPUs at once.

Well, a single kernel thread can't use multiple CPUs at once.  (So you
need more than one).

> >> The central aspect in my mind is a default share-everything, or
> >> default share-nothing.
> >
> > I really don't think you understand Concurrent Haskell, then.  (Or
> > Concurrent ML, or stackless Python, or libthread, or any other CSP-based
> > set-up).
> 
> Or Erlang, Occam, or heck, even jcsp.  Because I'm coming at this from a
> slightly different perspective

Different enough we're talking past each other.  The idea that the thing
you make with forkIO doesn't count as a thread never crossed my mind,
sorry.

> and place a different emphasis on things

and use completely different definitions for key terms and make
statements which, substituting in the definitions I was using, are (as I
hope you grant) non-sensical

> you think I don't understand?

Not any more.  I just think your definition of `thread' is unexpected in
this context (without rather more elaboration).

> No, trust me, I do understand them[1],
> and think CSP and actor models (the differences in nondeterminism is a
> minor detail that doesn't much matter here) are extremely nice ways of
> implementing parallel systems.

I'm glad to hear that...

> These are, in fact, process models.

OK.  I think that perspective is rather unique, but OK.

> They are implemented on top of thread models,
> but that's a performance hack.

Maybe.  It's done for performance, but I don't see why you call it a
hack.  Does it sacrifice some important advantage I'm missing?  (Vs.
kernel-scheduled threads).

> And while putting this model on top
> restores much of the programming sanity, in languages with mutable
> variables and references that can be passed, you still need a fair
> bit of discipline to keep that sanity.  There, the implementation detail
> of thread, rather than process allows and even encourages shortcuts that
> violate the process model.  In languages that are immutable, taking
> advantage of the shared memory space really can gain efficiency without
> any noticeably downside.

Nice clarification.[1]  Thanks.

jcc

[1] I am, btw., painfully aware that Haskell has mutable references that
can be passed between threads.  Just as I am painfully aware of Unix's,
um, interesting ideas on maintaining file system consistency in the
presence of concurrent access to *that* shared resource...