[Haskell-cafe] Re: Python's big challenges, Haskell's big advantages?

Manlio Perillo manlio_perillo at libero.it
Thu Sep 18 15:10:07 EDT 2008

Simon Marlow ha scritto:
> Jonathan Cast wrote:
>> On Wed, 2008-09-17 at 13:44 -0700, Evan Laforge wrote:
>>>>> systems that don't use an existing user-space thread library (such as
>>>>> Concurrent Haskell or libthread [1]) emulate user-space threads by
>>>>> keeping a pool of processors and re-using them (e.g., IIUC Apache does
>>>>> this).
>>>> Your response seems to be yet another argument that processes are too
>>>> expensive to be used the same way as threads.  In my mind pooling vs
>>>> new-creation is only relevant to process vs thread in the performance
>>>> aspects.  The fact that people use thread-pools means that they think
>>>> that even thread-creation is too expensive.  The central aspect in my
>>>> mind is a default share-everything, or default share-nothing.  One is
>>>> much easier to reason about and encourages writing systems that have
>>>> less shared-memory contention.
>>> This is similar to the plan9 conception of processes.  You have a
>>> generic rfork() call that takes flags that say what to share with your
>>> parent: namespace, environment, heap, etc.  Thus the only difference
>>> between a thread and a process is different flags to rfork().
>> As I mentioned, Plan 9 also has a user-space thread library, similar to
>> Concurrent Haskell.
>>> Under the covers, I believe linux is similar, with its clone() call.
>>> The fast context switching part seems orthogonal to me.  Why is it
>>> that getting the OS involved for context switches kills the
>>> performance?
>> Read about CPU architecture.
>>> Is it that the ghc RTS can switch faster because it
>>> knows more about the code it's running (i.e. the OS obviously couldn't
>>> switch on memory allocations like that)?  Or is jumping up to kernel
>>> space somehow expensive by nature?
>> Yes.  Kernel code is very different on the bare metal from userspace
>> code; RTS code of course is not at all different.  Switching processes
>> in the kernel requires an interrupt or a system call.  Both of those
>> require the processor to dump the running process's state so it can be
>> restored later (userspace thread-switching does the same thing, but it
>> doesn't dump as much state because it doesn't need to be as conservative
>> about what it saves).
>>> And why does the OS need so many
>>> more K to keep track of a thread than the RTS?
>> An OS thread (Linux/Plan 9) stores:
>> * Stack (definitely a stack pointer and stored registers (> 40 bytes on
>> i686) and includes a special set of page tables on Plan 9)
>> * FD set (even if it's the same as the parent thread, you need to keep a
>> pointer to it
>> * uid/euid/gid/egid (Plan 9 I think omits euid and egid)
>> * Namespace (Plan 9 only; again, you need at least a pointer even if
>> it's the same as the parent process)
>> * Priority
>> * Possibly other things I can't think of right now
>> A Concurrent Haskell thread stores:
>> * Stack
>> * Allocation area (4KB)
> Allocation areas are per-CPU, not per-thread.  A Concurrent Haskell 
> thread consists of a TSO (thread state object, currently 11 machine 
> words), and a stack, which we currently start with 1KB and grow on demand.

How is this implemented?

I have seen some coroutine implementations in C, using functions from 
ucontext.h (or direct asm code), but all have the problem that the 
allocated stack is fixed.

Thanks  Manlio Perillo

More information about the Haskell-Cafe mailing list