[Haskell-cafe] Re: Python's big challenges, Haskell's big advantages?

Wed Sep 17 16:55:51 EDT 2008

On Wed, 2008-09-17 at 13:44 -0700, Evan Laforge wrote:
> >> systems that don't use an existing user-space thread library (such as
> >> Concurrent Haskell or libthread [1]) emulate user-space threads by
> >> keeping a pool of processors and re-using them (e.g., IIUC Apache does
> >> this).
> >
> > Your response seems to be yet another argument that processes are too
> > expensive to be used the same way as threads.  In my mind pooling vs
> > new-creation is only relevant to process vs thread in the performance
> > aspects.  The fact that people use thread-pools means that they think
> > that even thread-creation is too expensive.  The central aspect in my
> > mind is a default share-everything, or default share-nothing.  One is
> > much easier to reason about and encourages writing systems that have
> > less shared-memory contention.
> 
> This is similar to the plan9 conception of processes.  You have a
> generic rfork() call that takes flags that say what to share with your
> parent: namespace, environment, heap, etc.  Thus the only difference
> between a thread and a process is different flags to rfork().

As I mentioned, Plan 9 also has a user-space thread library, similar to
Concurrent Haskell.

> Under the covers, I believe linux is similar, with its clone() call.
> 
> The fast context switching part seems orthogonal to me.  Why is it
> that getting the OS involved for context switches kills the
> performance?

Read about CPU architecture.

> Is it that the ghc RTS can switch faster because it
> knows more about the code it's running (i.e. the OS obviously couldn't
> switch on memory allocations like that)?  Or is jumping up to kernel
> space somehow expensive by nature?

Yes.  Kernel code is very different on the bare metal from userspace
code; RTS code of course is not at all different.  Switching processes
in the kernel requires an interrupt or a system call.  Both of those
require the processor to dump the running process's state so it can be
restored later (userspace thread-switching does the same thing, but it
doesn't dump as much state because it doesn't need to be as conservative
about what it saves).

> And why does the OS need so many
> more K to keep track of a thread than the RTS?

An OS thread (Linux/Plan 9) stores:

* Stack (definitely a stack pointer and stored registers (> 40 bytes on
i686) and includes a special set of page tables on Plan 9)
* FD set (even if it's the same as the parent thread, you need to keep a
pointer to it
* uid/euid/gid/egid (Plan 9 I think omits euid and egid)
* Namespace (Plan 9 only; again, you need at least a pointer even if
it's the same as the parent process)
* Priority
* Possibly other things I can't think of right now

A Concurrent Haskell thread stores:

* Stack
* Allocation area (4KB)

The kernel offers more to a process (and offers a wider separation
between processes) than Concurrent Haskell offers to a thread.

jcc