Bound Threads

Fri Mar 14 07:24:19 EST 2003

> I have just spend some time reading through all the discussions and the
> new "threads" document and I would like to propose the addition of a 
> new library function.
>
>> forkOS :: IO () -> IO ThreadID

Something like that is already in the proposal, only it's currently 
called forkBoundThread and it doesn't return the ThreadID (that can be 
changed, though).

> With this, I also propose that "forkIO" always runs a Haskell thread 
> in the same OS thread that the current Haskell thread runs in.
> (i.e. "forkIO": same OS thread, "forkOS": new OS thread)

In the proposal we wrote:
"The specification shouldn’t explicitly require lightweight “green” 
threads
to exist. The specification should be implementable in a simple and 
obvious
way in haskell systems that always use a 1:1 correspondence between
Haskell threads and OS threads."

The idea was that lightweight ("green") threads are an optimization 
only (do they have any other advantage?), not a language feature, and 
that implementations of Haskell should not be forced to support a 
complex thread management system.
Your proposal obviously contradicts this.

What is the advantage of explicitly requiring one OS thread to execute 
(the foreign calls made by) several Haskell threads?
So far, I was only able to think of two possible situations:

a) The foreign functions don't care what thread they are called from

In that case, I would like the implementation to run my Haskell threads 
in the most efficient way possible. Currently, that means scheduling 
them all in one OS thread, but that is an implementation detail that I 
don't want to care about when I'm writing a normal application. On a 
four-processor-SMP machine, the most efficient way is to run them 
simultaneously in four OS threads (no implementation currently supports 
this, but there's experimental code in the GHC repository).

b) The foreign functions do care what thread they are called from

In that case I want the implementation to have an exact correspondence 
between Haskell threads and OS thread. I just want to think about "one 
thread", and I don't want to manage some correspondence between Haskell 
threads and OS threads manually.

> Using the new primitive, we can view the new "threadsafe" keyword as
> syntactic sugar:
>
>> foreign import threadsafe foo :: Int -> IO Int
>
> ===>
>
>> foo :: Int -> IO Int
>> foo i = threadSafe (primFoo i)
>>
>> foreign import "foo" primFoo :: IO Int
>
> where
>
>> threadSafe :: IO a -> IO a
>> threadSafe io
>> = do result <-newEmptyMVarforkOS (do{ x <-io; putMVar result x })
>> getMVar result

That looks dangerous:
I want to call both threadsafe imports and unsafe imports from a "bound 
thread", and I expect all foreign calls from a bound thread to be 
executed from the same OS thread (by the definitioon of a "bound 
thread"). This implementation of "threadsafe" always uses another (new 
or pooled) OS thread for the threadsafe call.

>> getOSThread :: ThreadID -> OSThreadID
>> forkIOIn :: OSThreadID -> IO () -> IO ThreadID

Why should the RTS do inter-OS-thread messaging for us?

> I have the feeling that it is not difficult to implement "forkOS" and 
> family
> once the runtime system has been upgraded to support multiple OS 
> threads.
> Wolfgang, you seem to be the expert on the OS thread area, would it be 
> hard?

It would definitely more difficult to implement in GHC than the current 
proposal, but it could be done. In fact I think that implementing it 
would be more fun for me than having to use it afterwards.

> I am not saying that we should discard the "threadsafe" keyword as it 
> might
> be a useful shorthand, but I think that it is in general a mistake to 
> try to keep the management of OS threads implicit -- don't use new 
> keywords, add combinators to implement them!

Management of OS threads _should_ be kept implicit. Ideally, the user 
should never notice that the GHC runtime is using green threads 
internally.

> I feel that the following has happened; urk, we need some way of 
> keeping haskell threads running while calling C; we add "threadsafe"; 
> whoops, sometimes
> a function expects that it is run in the same OS thread; we add 
> "bound";  whoops, sometimes functions expect to be run from a specific 
> OS thread... unsolved??

Not unsolved. Use Control.Concurrent.Chan :-)

> Before we know it, we have added tons of new keywords to solve the 
> wrong problem.

The problem being, that some Haskell implementation try to optimize 
concurrency by doing the scheduling themselves. We have to provide 
hints (threadsafe and bounds) to the implementation to specify just how 
much it is allowed to optimize. We should never be required to 
explicitly do the "optimization" in the source code. It will break with 
SMP implementations (which I expect to be using in a few years), 
because different optimizations are required - suddenly it will be 
desirable to have multiple OS threads for performance reasons.

> Maybe it is time to take a step back and use a somewhat lower level 
> model with
> two fork variants: "forkIO" (in the same OS thread) and "forkOS" (in a 
> new OS thread).
> It seems that none of the above problems occur when having explicit 
> control.
> In general it seems that OS threads are a resource that is too subtle 
> to be managed automatically as they have a profound impact on how 
> libraries are used and applications are structured.

My recipe:
1) Mark all your foreign imports as threadsafe
2) Mark foreign imports that are guaranteed to only need a short amount 
of time (<50ms at most, I'd say) and that won't call back to Haskell, 
as unsafe
3) Just pretend that every Haskell thread is an OS thread
4) If you're using libraries that rely on thread-local state (and 
therefore can find out that point 3 might not be strictly true), add 
"bound" to your foreign exports or wrap your IO actions in 
forkBoundThread.

I don't see any remaining problems, and it looks simpler to me than 
managing Haskell threads and OS threads explicitly.

OK, enough talking about why I like my ideas better than yours ;-) , I 
still have a few questions:

What would a safe foreign import do in your proposal?
What other Haskell threads would be blocked when I call a safe foreign 
import? When would they be unblocked again? What happens when the 
foreign import calls back to Haskell? Or would they be blocked at all? 
For A Haskell implementation that runs in one thread and just uses 
separate threads for foreign calls, it may be natural to not block 
until somebody tries to make another foreign call in the same OS 
thread...
And how would the whole thing work in a SMP Haskell system?

When writing programs and libraries, how would I manage the complex 
interactions that could happen? How can a library make use of forkIO if 
it doesn't know what _other_ Haskell threads might be running in the 
same OS thread?

Currently, I can also use the threadsafe attribute for long 
calculations outside the IO monad, i.e.

foreign import ccall threadsafe doSomeTerriblyComplicatedCalculation :: 
Double -> Double

If one Haskell thread evaluated this, all other Haskell threads would 
keep running. Do we need to use unsafePerformIO again in your proposal?

Simon Marlow wrote:
> I'm basing this on two assumptions: (a) switching OS threads is
> expensive and (b) threadsafe foreign calls are common.  I could
> potentially be wrong on either of these, and I'm prepared to be
> persuaded.  But if both (a) and (b) turn out to be true, then worse is
> worse in this case.

a) Is probably true. If it was absolutely wrong, we could do away with 
all the complexity, use one OS thread for each Haskell thread and 
shorten the GHC RTS by a few thousand lines...
b) Personally, I would want to use them for every foreign call that is 
not guaranteed to finish within, say, 50ms, so it will certainly be 
true for _my_ programs.
I would expect all libraries that I want to use to do this, too, 
because otherwise my program might unexpectedly be blocked by threads 
spawned by the library.

Cheers,

Wolfgang