Native Threads in the RTS
Wed, 4 Dec 2002 00:42:13 +0100
Dean Herrington wrote:
> [...] Rather, I find it
> nonintuitive that calling from Haskell to foreign code and back into
> should create a new Haskell thread, when these two Haskell threads
> are just different portions of a single "thread of computation"
> (deliberately vague term).
I agree to that. Creating a new thread for calling back into Haskell
_only_ makes sense if you look at it from inside the GHC RTS. Before I
had a look at the relevant parts of the RTS, I would never have thought
I don't know if there's any advantage/disadvantage to changing GHC's
internals. The only _observable_ difference is the thread's ThreadIds,
and this should at least be clearly documented (or, even better, it
should be "explicitly undocumented", so that no one will be suprised if
the behaviour is changed in the future).
> Off the top of my head I can think of two
> situations in which having separate threads is bothersome.
3. Throwing exceptions to a thread
If I manually translate haskell exceptions to foreign exceptions and
back, there is no reason why I shouldn't want to raise an exception in
a thread I have a threadId for, even if that thread called a foreign
function which in turn called back to haskell.
I think that the behaviour can always be emulated using MVars however,
so I think there's no immediate action required.
I've tried to rephrase my proposal for native threads, this time
treating GHC's behaviour in this situation as an implementation detail.
I think the meaning of the proposal becomes clearer because of this.
The proposal doesn't comment on ThreadIds, so the non-intuitive (IMHO)
behaviour in GHC is independent of the "bound threads" proposal.
I think I've understood both my own specification and the current RTS
well enough to start trying to implement a prototype soon. The intended
meaning of the specification hasn't changed for the third revision in a
Does anyone have concrete suggestions for the syntax change to foreign
export and foreign import "wrapper"?
Bound Threads Proposal, version 5
Since foreign libraries sometimes exploit thread local state, it is
necessary to provide some control over which thread is used to execute
foreign code. In particular, it is important that it should be
possible for Haskell code to arrange that a sequence of calls to a
given library are performed by the same native thread and that if an
external library calls into Haskell, then any outgoing calls from
Haskell are performed by the same native thread.
This specification is intended to be implementable both by
multithreaded Haskell implementations and by single-threaded
implementations and so it does not comment on which particular OS
thread is used to execute Haskell code.
A native thread is a thread as defined by the operating system.
A "Haskell thread" encapsulates the execution of a Haskell I/O action.
A Haskell thread is created by forkIO, and dies when the I/O action
When a Haskell thread calls a foreign imported function, it is not
considered to be blocked (in the GHC runtime system, the calling thread
is blocked; This is considered an implementation detail for the
purposes of this specification, but be aware that myThreadId might
return several different values for one "Haskell thread" as defined
here). If the foreign function calls back to Haskell, the callback is
said to run in the same Haskell thread.
Haskell threads may be associated at thread creation time with either
zero or one native threads. Each Native thread is associated with at
most one Haskell thread.
A native thread that is associated with a Haskell thread is called a
bound Haskell thread. A Haskell thread that is associated with a native
thread is called a bound native thread.
A Haskell thread is always executed by a native thread. This
specification places absolutely no restrictions on which native thread
is used to execute a particular Haskell thread. The Haskell thread need
not be associated with the native thread used to execute it, and one
Haskell thread may be executed by more than one native thread during
its lifetime [but not by several native threads at once].
A bound native thread may not be used for executing any Haskell thread
except the one it is bound to.
It is implementation dependent whether the main thread, threads created
using forkIO and threads created for running finalizers or signal
handlers are bound or not.
When a foreign imported function is invoked [by Haskell code], the
foreign code is executed in the native thread associated with the
current Haskell thread, if an association exists. If the current
Haskell thread is not associated to a native thread, the implementation
may decide which native thread to run the foreign function in. The
native thread that is used may not be bound to another Haskell thread.
The existing distinction between unsafe, safe and threadsafe calls
There are now two kinds of foreign export and foreign import "wrapper"
declarations: bound and free. The FFI syntax should be extended
appropriately [which of the two should be the default, if any?].
Bound foreign exported functions should be executed in a Haskell thread
bound to the native thread that invoked the foreign exported function.
A "free" foreign export may be executed in any kind of Haskell thread.
A new library routine, forkNativeThread :: IO () -> IO ThreadID, should
spawn a new Haskell Thread (like forkIO) and associate it with a new
native thread (forkIO is not guaranteed to do this). It may be
implemented using the FFI and an OS-specific thread creation routine.
It would just pass a "bound" callback as an entry point for a new OS
Finalizers and signal handlers cannot be associated with a particular
native thread. If they have to trigger an action in a particular native
thread, a message has to be sent manually (via MVars and friends) to
the Haskell thread associated with the native thread in question. I
think we'll have to live with this. Does anyone have a better idea?
This introduces a change in the syntax for foreign export and foreign
import "wrapper" declarations (a bound/free specifier is added). I
think we should have a default option here. I'm not sure which,
however. Also, the objection that "bound" and "free" can be confused
with the lambda calculus terms still holds.
Here are some examples of how the specification might be implemented.
They should not be considered an actual part of the specification.
Let's assume we have a haskell system that has used OS native threads
from the start. Every call to forkIO creates a new OS thread. The OS is
responsible for all scheduling. Now we want to add support for [my
version of] the proposal to this implementation.
This should be trivial to do: A foreign call should be just a call, and
a callback should just start executing Haskell code in the current OS
This implementation would treat all foreign exports as bound ("the
implementation may freely choose what kind of Haskell thread the
function is executed in"). All "safe" calls will probably be treated as
"threadsafe" (after all, it's no use blocking other threads).
If it weren't for the performance problems, this would be the ideal
solution for me.
Let's assume we have a haskell system that executes all Haskell code in
one thread and does its own scheduling between those threads. Now we
want to add support for [my version of] the proposal. We do not want to
move execution of Haskell code to different threads. We are not
concerned about performance.
In this case, we would keep track of the association between Haskell
threads and "foreign" OS threads (here, the term "foreign thread" seems
to fit very well). If the Haskell code calls a foreign imported
function, a message is sent to the associated foreign thread (a new
foreign thread is created if necessary). If a foreign exported function
is called, it just signals the "Haskell runtime thread".
The performance would be better than 1) as long as no foreign functions
are involved. When the ffi is used, performance gets worse.
"The Middle Way", i.e. what I think should be implemented for GHC. The
following are just fragments of thoughts, don't expect it to be
* There is a global lock [iirc, that's the Capability in the GHC RTS]
which prevents several haskell threads from running truly concurrently.
* Each bound Haskell thread is executed by its associated native thread.
* Each bound native thread is executing at most one piece of code at a
time, i.e. there is no scheduling going on inside the bound native
* When a bound foreign export is invoked, the RTS creates a new Haskell
thread bound to the current OS thread.
The following things are unchanged (if any of those things is not
currently the case, please correct me):
* Unsafe calls are just plain old function calls
* All unbound Haskell threads are executed by a so-called "worker
thread". When an unbound Haskell thread calls a threadsafe imported
function, a new worker thread is created.
* when an unbound foreign export is invoked, the RTS creates a new
unbound Haskell thread.