[Haskell-cafe] Re: Python's big challenges, Haskell's big advantages?

Thu Sep 18 15:10:07 EDT 2008

Simon Marlow ha scritto:
> Jonathan Cast wrote:
>> On Wed, 2008-09-17 at 13:44 -0700, Evan Laforge wrote:
>>>>> systems that don't use an existing user-space thread library (such as
>>>>> Concurrent Haskell or libthread [1]) emulate user-space threads by
>>>>> keeping a pool of processors and re-using them (e.g., IIUC Apache does
>>>>> this).
>>>> Your response seems to be yet another argument that processes are too
>>>> expensive to be used the same way as threads.  In my mind pooling vs
>>>> new-creation is only relevant to process vs thread in the performance
>>>> aspects.  The fact that people use thread-pools means that they think
>>>> that even thread-creation is too expensive.  The central aspect in my
>>>> mind is a default share-everything, or default share-nothing.  One is
>>>> much easier to reason about and encourages writing systems that have
>>>> less shared-memory contention.
>>> This is similar to the plan9 conception of processes.  You have a
>>> generic rfork() call that takes flags that say what to share with your
>>> parent: namespace, environment, heap, etc.  Thus the only difference
>>> between a thread and a process is different flags to rfork().
>>
>> As I mentioned, Plan 9 also has a user-space thread library, similar to
>> Concurrent Haskell.
>>
>>> Under the covers, I believe linux is similar, with its clone() call.
>>>
>>> The fast context switching part seems orthogonal to me.  Why is it
>>> that getting the OS involved for context switches kills the
>>> performance?
>>
>> Read about CPU architecture.
>>
>>> Is it that the ghc RTS can switch faster because it
>>> knows more about the code it's running (i.e. the OS obviously couldn't
>>> switch on memory allocations like that)?  Or is jumping up to kernel
>>> space somehow expensive by nature?
>>
>> Yes.  Kernel code is very different on the bare metal from userspace
>> code; RTS code of course is not at all different.  Switching processes
>> in the kernel requires an interrupt or a system call.  Both of those
>> require the processor to dump the running process's state so it can be
>> restored later (userspace thread-switching does the same thing, but it
>> doesn't dump as much state because it doesn't need to be as conservative
>> about what it saves).
>>
>>> And why does the OS need so many
>>> more K to keep track of a thread than the RTS?
>>
>> An OS thread (Linux/Plan 9) stores:
>>
>> * Stack (definitely a stack pointer and stored registers (> 40 bytes on
>> i686) and includes a special set of page tables on Plan 9)
>> * FD set (even if it's the same as the parent thread, you need to keep a
>> pointer to it
>> * uid/euid/gid/egid (Plan 9 I think omits euid and egid)
>> * Namespace (Plan 9 only; again, you need at least a pointer even if
>> it's the same as the parent process)
>> * Priority
>> * Possibly other things I can't think of right now
>>
>> A Concurrent Haskell thread stores:
>>
>> * Stack
>> * Allocation area (4KB)
> 
> Allocation areas are per-CPU, not per-thread.  A Concurrent Haskell 
> thread consists of a TSO (thread state object, currently 11 machine 
> words), and a stack, which we currently start with 1KB and grow on demand.
> 

How is this implemented?

I have seen some coroutine implementations in C, using functions from 
ucontext.h (or direct asm code), but all have the problem that the 
allocated stack is fixed.

Thanks  Manlio Perillo