Native Threads in the RTS

Wolfgang Thaller wolfgang.thaller@gmx.net
Wed, 20 Nov 2002 12:05:55 +0100


> Great, thanks.  I hope you'll keep it up to date so that by the time  
> the
> discussion converges it can serve as a specification and rationale.  We
> can put it in CVS too... Simon will think of where!

Until then, I'll play the role of a "human CVS server".

> Ultimately it'd be
> worth integrating with
> http://www.cse.unsw.edu.au/~chak/haskell/ghc/comm/rts-libs/multi- 
> thread.
> html

Of course. Some parts should be part of the user documentation, while  
others should probably be considered implentation details.

> | A foreign exported
> | callback that is called from C code executing in that OS thread is
> | executed in the native haskell thread.
>
> This is the bit I don't understand.   Is the only scenario you have in
> mind here
>
> 	native Haskell thread
> 	calls C
> 	which calls Haskell
>
> and you want all that in the same native thread?

Yes, exactly.

> What about this?
>
> 	native Haskell thread
> 	calls C
> 	which installs a pointer to a foreign-exported Haskell function
> 		 in some C data structure
>
> Later... some other Haskell thread
> 	calls C
> 	which waits for an event
> 	which calls the callback
> So the callback was installed by a native thread, but won't be executed
> by it.  Is that ok?

Definitely. It's the same way it works in C. What thread some code  
executes in depends on what thread the code is called from.

> Anyway I think it would be worth explaining what is
> guaranteed a bit more clearly.

I'm not sure how... to me it looks like I already specified this  
exactly ;-). Anyway, I've added some examples to the proposal to  
clarify what I mean.

> | If a "green" haskell thread enters a foreign imported function marked
> | as "safe", all other green threads are blocked. Native haskell  
> threads
> | continue to run in their own OS threads.
>
> No, I don't think so.  The reason that 'safe' is cheaper than
> 'threadsafe' is that the current worker OS thread does not need to
> release the Big Lock it holds on the Haskell heap, thereby allowing
> other green threads to run.   Instead, it holds the lock, executes the
> call, and returns.  At least I think this is the idea, but it's all
> jolly slippery.

I thought that was "unsafe"? The "safe" version still does quite a lot  
(after all, a callbacks are allowed, so is GC). In addition,  
"threadsafe" may start a new OS thread in order to keep executing green  
threads.
On the other hand, we might simply leave it unspecified: If people want  
to know what happens to other threads, they should use "threadsafe" or  
"unsafe". The exact behaviour of "safe" seems to be an implementation  
detail.

> | Other things I'm not sure about:
>
> Presumably if a native thread spawns a thread using forkIO, it gets  
> just
> a green thread?  If it used forkNativeThread it gets a distinct native
> thread.  Better say this.

"The main program and all haskell threads forked using forkIO are green  
threads. Threads forked using forkNativeThread :: IO () -> IO () are  
native threads."
I thought that was clear enough... I've added a note.

Cheers,

Wolfgang
*****************
Native Threads Proposal, version 2

Some "foreign" libraries (for example OpenGL) rely on a mechanism  
called thread-local storage. The meaning of an OpenGL call therefore  
usually depends on which OS thread it is called from. Therefore, some  
kind of direct mapping from Haskell threads to OS threads is necessary  
in order to use the affected foreign libraries.
Executing every haskell thread in its own OS thread is not feasible for  
performance reasons. However, perfomance of native OS threads is not  
too bad as long as there aren't too many, so I propose that some  
threads get their own OS threads, and some don't:

Every Haskell Thread can be either a "green" thread or a "native"  
thread.
For each "native" thread, there is exactly one OS thread created by the  
RTS. For a green thread, it is unspecified which OS thread it is  
executed in.
The main program and all haskell threads forked using forkIO are green  
threads. Threads forked using forkNativeThread :: IO () -> IO () are  
native threads. (Note: The type of the current haskell thread does  
_not_ matter when forking new threads)

Execution of a green thread might move from one OS thread to another at  
any time. A "green" thread is never executed in an OS thread that is  
reserved for a "native" thread.
A "native" haskell thread and all foreign imported functions that it  
calls are executed in its associated OS thread. A foreign exported  
callback that is called from C code executing in that OS thread is  
executed in the native haskell thread.
A foreign exported callback that is called from C code executing in an  
OS thread that is not associated with a "native" haskell thread is  
executed in a new green haskell thread.

Only one OS thread can execute Haskell code at any given time.

If a "native" haskell thread enters a foreign imported function that is  
marked as "safe" or "threadsafe", all other Haskell threads keep  
running. If the imported function is marked as "unsafe", no other  
threads are executed until the call finishes.

If a "green" haskell thread enters a foreign imported function marked  
as "threadsafe", a new OS thread is spawned that keeps executing other  
green haskell threads while the foreign function executes. Native  
haskell threads continue to run in their own OS threads.
If a "green" haskell thread enters a foreign imported function marked  
as "safe", all other green threads are blocked. It is implementation  
dependent whether native haskell threads continue to run in their own  
OS threads.
If the imported function is marked as "unsafe", no other threads are  
executed until the call finishes.

Finalizers are always run in green threads.

Issues deliberately not addressed in this proposal:
Some people may want to run several Haskell threads in a dedicated OS  
thread (this is what has been called "thread groups" before).
Some people may want to run finalizers in specific OS threads (are  
finalizers predictable enough for this to be useful?).
Everyone would want SMP if it came for free (but SMP seems to be too  
hard to do at the moment...)

Other things I'm not sure about:
What should we do get if a foreign function spawns a new OS thread and  
executes a haskell callback in that OS thread? Should a new native  
haskell thread that executes in the OS thread be created? Should the  
new OS thread be blocked and the callback executed in a green thread?  
What does the current threaded RTS do? (I assume the non-threaded RTS  
will just crash?)

Some (not very concrete) examples:
1.)
Let's assume a piece of C code --- lets call it foo() --- was called by  
a green haskell thread. If foo() now invokes a haskell function, the  
haskell function might be executed in a different OS thread than foo().  
This means that if the haskell code calls another C function, bar(),  
then bar() doesn't have access to the same thread-local state as foo().  
For example, if foo() sets up an OpenGL context, then bar() can't use  
it.

2.)
If foo() was invoked by a native haskell thread, it is guaranteed that  
all haskell functions invoked by foo() run in the same native haskell  
thread and therefore in the same OS thread. Now if the haskell code  
again calls bar(), then bar() is executed in the same OS thread as  
foo() and the native haskell thread. This means that bar() has access  
to the same thread-local state as foo() (---> OpenGL works).

3.)
A piece of C code creates a new OS thread and calls a haskell function  
in that new
OS thread. I don't think it makes sense to tun the haskell function in  
an existing haskelll thread, so we'll create a new one. What kind of  
haskell thread (native or green) should the haskell function run in?
I'm slightly in favour of a new native thread (after all, the C code  
might have it's reasons for spawning a new OS thread).