Bound Threads
Wolfgang Thaller
wolfgang.thaller at gmx.net
Fri Mar 14 07:24:19 EST 2003
> I have just spend some time reading through all the discussions and the
> new "threads" document and I would like to propose the addition of a
> new library function.
>
>> forkOS :: IO () -> IO ThreadID
Something like that is already in the proposal, only it's currently
called forkBoundThread and it doesn't return the ThreadID (that can be
changed, though).
> With this, I also propose that "forkIO" always runs a Haskell thread
> in the same OS thread that the current Haskell thread runs in.
> (i.e. "forkIO": same OS thread, "forkOS": new OS thread)
In the proposal we wrote:
"The specification shouldn’t explicitly require lightweight “green”
threads
to exist. The specification should be implementable in a simple and
obvious
way in haskell systems that always use a 1:1 correspondence between
Haskell threads and OS threads."
The idea was that lightweight ("green") threads are an optimization
only (do they have any other advantage?), not a language feature, and
that implementations of Haskell should not be forced to support a
complex thread management system.
Your proposal obviously contradicts this.
What is the advantage of explicitly requiring one OS thread to execute
(the foreign calls made by) several Haskell threads?
So far, I was only able to think of two possible situations:
a) The foreign functions don't care what thread they are called from
In that case, I would like the implementation to run my Haskell threads
in the most efficient way possible. Currently, that means scheduling
them all in one OS thread, but that is an implementation detail that I
don't want to care about when I'm writing a normal application. On a
four-processor-SMP machine, the most efficient way is to run them
simultaneously in four OS threads (no implementation currently supports
this, but there's experimental code in the GHC repository).
b) The foreign functions do care what thread they are called from
In that case I want the implementation to have an exact correspondence
between Haskell threads and OS thread. I just want to think about "one
thread", and I don't want to manage some correspondence between Haskell
threads and OS threads manually.
> Using the new primitive, we can view the new "threadsafe" keyword as
> syntactic sugar:
>
>> foreign import threadsafe foo :: Int -> IO Int
>
> ===>
>
>> foo :: Int -> IO Int
>> foo i = threadSafe (primFoo i)
>>
>> foreign import "foo" primFoo :: IO Int
>
> where
>
>> threadSafe :: IO a -> IO a
>> threadSafe io
>> = do result <-newEmptyMVarforkOS (do{ x <-io; putMVar result x })
>> getMVar result
That looks dangerous:
I want to call both threadsafe imports and unsafe imports from a "bound
thread", and I expect all foreign calls from a bound thread to be
executed from the same OS thread (by the definitioon of a "bound
thread"). This implementation of "threadsafe" always uses another (new
or pooled) OS thread for the threadsafe call.
>> getOSThread :: ThreadID -> OSThreadID
>> forkIOIn :: OSThreadID -> IO () -> IO ThreadID
Why should the RTS do inter-OS-thread messaging for us?
> I have the feeling that it is not difficult to implement "forkOS" and
> family
> once the runtime system has been upgraded to support multiple OS
> threads.
> Wolfgang, you seem to be the expert on the OS thread area, would it be
> hard?
It would definitely more difficult to implement in GHC than the current
proposal, but it could be done. In fact I think that implementing it
would be more fun for me than having to use it afterwards.
> I am not saying that we should discard the "threadsafe" keyword as it
> might
> be a useful shorthand, but I think that it is in general a mistake to
> try to keep the management of OS threads implicit -- don't use new
> keywords, add combinators to implement them!
Management of OS threads _should_ be kept implicit. Ideally, the user
should never notice that the GHC runtime is using green threads
internally.
> I feel that the following has happened; urk, we need some way of
> keeping haskell threads running while calling C; we add "threadsafe";
> whoops, sometimes
> a function expects that it is run in the same OS thread; we add
> "bound"; whoops, sometimes functions expect to be run from a specific
> OS thread... unsolved??
Not unsolved. Use Control.Concurrent.Chan :-)
> Before we know it, we have added tons of new keywords to solve the
> wrong problem.
The problem being, that some Haskell implementation try to optimize
concurrency by doing the scheduling themselves. We have to provide
hints (threadsafe and bounds) to the implementation to specify just how
much it is allowed to optimize. We should never be required to
explicitly do the "optimization" in the source code. It will break with
SMP implementations (which I expect to be using in a few years),
because different optimizations are required - suddenly it will be
desirable to have multiple OS threads for performance reasons.
> Maybe it is time to take a step back and use a somewhat lower level
> model with
> two fork variants: "forkIO" (in the same OS thread) and "forkOS" (in a
> new OS thread).
> It seems that none of the above problems occur when having explicit
> control.
> In general it seems that OS threads are a resource that is too subtle
> to be managed automatically as they have a profound impact on how
> libraries are used and applications are structured.
My recipe:
1) Mark all your foreign imports as threadsafe
2) Mark foreign imports that are guaranteed to only need a short amount
of time (<50ms at most, I'd say) and that won't call back to Haskell,
as unsafe
3) Just pretend that every Haskell thread is an OS thread
4) If you're using libraries that rely on thread-local state (and
therefore can find out that point 3 might not be strictly true), add
"bound" to your foreign exports or wrap your IO actions in
forkBoundThread.
I don't see any remaining problems, and it looks simpler to me than
managing Haskell threads and OS threads explicitly.
OK, enough talking about why I like my ideas better than yours ;-) , I
still have a few questions:
What would a safe foreign import do in your proposal?
What other Haskell threads would be blocked when I call a safe foreign
import? When would they be unblocked again? What happens when the
foreign import calls back to Haskell? Or would they be blocked at all?
For A Haskell implementation that runs in one thread and just uses
separate threads for foreign calls, it may be natural to not block
until somebody tries to make another foreign call in the same OS
thread...
And how would the whole thing work in a SMP Haskell system?
When writing programs and libraries, how would I manage the complex
interactions that could happen? How can a library make use of forkIO if
it doesn't know what _other_ Haskell threads might be running in the
same OS thread?
Currently, I can also use the threadsafe attribute for long
calculations outside the IO monad, i.e.
foreign import ccall threadsafe doSomeTerriblyComplicatedCalculation ::
Double -> Double
If one Haskell thread evaluated this, all other Haskell threads would
keep running. Do we need to use unsafePerformIO again in your proposal?
Simon Marlow wrote:
> I'm basing this on two assumptions: (a) switching OS threads is
> expensive and (b) threadsafe foreign calls are common. I could
> potentially be wrong on either of these, and I'm prepared to be
> persuaded. But if both (a) and (b) turn out to be true, then worse is
> worse in this case.
a) Is probably true. If it was absolutely wrong, we could do away with
all the complexity, use one OS thread for each Haskell thread and
shorten the GHC RTS by a few thousand lines...
b) Personally, I would want to use them for every foreign call that is
not guaranteed to finish within, say, 50ms, so it will certainly be
true for _my_ programs.
I would expect all libraries that I want to use to do this, too,
because otherwise my program might unexpectedly be blocked by threads
spawned by the library.
Cheers,
Wolfgang
More information about the FFI
mailing list