Bound Threads

Fri Mar 14 07:20:24 EST 2003

Hi Simon,

> I'd like to point out a concept that I think is being missed here:
>
> We never want to specify what OS thread is running a particular
> Haskell thread.
>
> why not?  Because (a) it doesn't matter: the programmer can never tell,
> and (b) we want to give the implementation freedom to spread Haskell
> threads across multiple OS threads to make use of multiple real CPUs.

I agree that these are valid points. However, as I said, I don't think
we can do (b), ie. automatic management, in many real-world situations. The above points are mostly useful in a pure Haskell setting.

In general, I think that only the programmer knows what strategy to use.
In particular, we can provide a "fork" function that forks of a new
Haskell thread that maybe runs in a new OS thread, or other CPU; basically
implementing the above concept for programs that don't care about how
the Haskell threads are distributed over OS threads:

fork :: IO () -> IO ThreadID
fork io
  = do newOS <-[complex algorithm that determines if a new OS thread is needed.]
       if (newOS) then forkOS io
        else do threadID <-[complex algorithm that determines in which existing thread we run it]
                forkIOIn threadID io

Note that we can now implement our really sophisticated distributed algorithms in plain Haskell.

> The point is that you want to
> specify which OS thread is used to invoke a foreign function, NOT which
> OS thread is used to execute Haskell code.  The semantics that Simon & I
> wrote make this clear.

This is a good point and that is also the weakness of the "forkOS", "forkIO" approach: it is less declarative and thus leaves less freedom to the implementation.
However, I hope that through functions like "fork", we can bring back declarativeness by abstraction.

> If we keep thinking like this, then implementations like Hugs can be
> single-threaded internally but switch OS threads to call out to foreign
> functions, and implementations like GHC can be multi-threaded internally
> and avoid switching threads when calling out to foreign functions.

Ha, this is not true :-) We are saved by your observation that in the Haskell
world we can't observe whether we run in a different OS thread or not. Thus
a single-threaded Hugs will implement forkOS as forkIO but still attaches a different "Hugs OS thread identifier" to the Haskell thread. When a foreign call
is made, it matches the Hugs OS thread identifiers and uses a different OS thread
if necessary, maintaining a mapping between the Hugs OS thread identifiers
and the spawned OS threads.

>> > threadSafe :: IO a -> IO a
>> > threadSafe io
>> > = do result <-newEmptyMVar
>> >      forkOS (do{ x <-io; putMVar result x })
>> >      getMVar result
>
> This forces a thread switch when calling a threadsafe foreign function,
> which is something I think we want to avoid.

We can refine the implementation to avoid a thread switch when it
is the only Haskell thread running in the current OS thread:

threadSafeEx :: IO a -> IO a
threadSafeEx io
  = do count <-getHaskellThreadCountInTheCurrentOSThreadif (count > 1)
        then threadSafe io
        else io

> I'm basing this on two assumptions: (a) switching OS threads is
> expensive and (b) threadsafe foreign calls are common.  I could
> potentially be wrong on either of these, and I'm prepared to be
> persuaded.  But if both (a) and (b) turn out to be true, then worse is
> worse in this case.

I think you are righ on (a), but I also think that we can avoid it
just as it can be sometimes avoided when implemented in C in the runtime.
Can't say anything about (b).

All the best,
  Daan.

Now, I have an example from the wxHaskell GUI library that exposes some
of the problems with multiple threads. I can't say it can be solved
nicely with forkOS, so I wonder how it would work out with "threadsafe":

The example is a Haskell initialization function that is called via
a callback from the GUI library. The Haskell initialization function wants
to do a lot processing but still stay reactive to close events for example.
Since events are processed in an eventloop, new events can only come in
by returning from the callback. So, the initilization functions forks of
a Haskell thread (the processor) to do all the work and returns as soon as possible to the C GUI library. Now, the eventloop starts to wait for the next event in C land.

The problem is that the "processor" thread won't run since we have returned
to C-land and the haskell scheduler can't run. We can solve it by running the
processor thread with "forkOS". I can't say it is a particularly nice solution
but it is how it is done in all other major programming languages.

I wonder how the "threadsafe" keyword can be used to solve this problem.
Since the haskell function is called via a callback, I guess that "threadsafe"
should also apply to "wrapper" functions -- that is, when the foreign world
calls haskell, we use another OS thread to run the haskell code. However, I think
that we are than forced to use a OS thread context switch??

>
> Cheers,
> 	Simon
>
>
>