[Haskell-cafe] Re: Python's big challenges, Haskell's big advantages?

Thu Sep 18 05:33:22 EDT 2008

Jonathan Cast wrote:
> On Wed, 2008-09-17 at 13:44 -0700, Evan Laforge wrote:
>>>> systems that don't use an existing user-space thread library (such as
>>>> Concurrent Haskell or libthread [1]) emulate user-space threads by
>>>> keeping a pool of processors and re-using them (e.g., IIUC Apache does
>>>> this).
>>> Your response seems to be yet another argument that processes are too
>>> expensive to be used the same way as threads.  In my mind pooling vs
>>> new-creation is only relevant to process vs thread in the performance
>>> aspects.  The fact that people use thread-pools means that they think
>>> that even thread-creation is too expensive.  The central aspect in my
>>> mind is a default share-everything, or default share-nothing.  One is
>>> much easier to reason about and encourages writing systems that have
>>> less shared-memory contention.
>> This is similar to the plan9 conception of processes.  You have a
>> generic rfork() call that takes flags that say what to share with your
>> parent: namespace, environment, heap, etc.  Thus the only difference
>> between a thread and a process is different flags to rfork().
> 
> As I mentioned, Plan 9 also has a user-space thread library, similar to
> Concurrent Haskell.
> 
>> Under the covers, I believe linux is similar, with its clone() call.
>>
>> The fast context switching part seems orthogonal to me.  Why is it
>> that getting the OS involved for context switches kills the
>> performance?
> 
> Read about CPU architecture.
> 
>> Is it that the ghc RTS can switch faster because it
>> knows more about the code it's running (i.e. the OS obviously couldn't
>> switch on memory allocations like that)?  Or is jumping up to kernel
>> space somehow expensive by nature?
> 
> Yes.  Kernel code is very different on the bare metal from userspace
> code; RTS code of course is not at all different.  Switching processes
> in the kernel requires an interrupt or a system call.  Both of those
> require the processor to dump the running process's state so it can be
> restored later (userspace thread-switching does the same thing, but it
> doesn't dump as much state because it doesn't need to be as conservative
> about what it saves).
> 
>> And why does the OS need so many
>> more K to keep track of a thread than the RTS?
> 
> An OS thread (Linux/Plan 9) stores:
> 
> * Stack (definitely a stack pointer and stored registers (> 40 bytes on
> i686) and includes a special set of page tables on Plan 9)
> * FD set (even if it's the same as the parent thread, you need to keep a
> pointer to it
> * uid/euid/gid/egid (Plan 9 I think omits euid and egid)
> * Namespace (Plan 9 only; again, you need at least a pointer even if
> it's the same as the parent process)
> * Priority
> * Possibly other things I can't think of right now
> 
> A Concurrent Haskell thread stores:
> 
> * Stack
> * Allocation area (4KB)

Allocation areas are per-CPU, not per-thread.  A Concurrent Haskell thread 
consists of a TSO (thread state object, currently 11 machine words), and a 
stack, which we currently start with 1KB and grow on demand.

Cheers,
	Simon