Summary so far (was: HOpenGL and --enable-threaded-rts)

Thu, 20 Jun 2002 13:03:44 +0100

This discussion is getting rather long, so I thought I'd summarise (as
much for my benefit as everyone else's).  Please let me know if I get
anything wrong.

It turns out that some C libraries designed to be used from
multi-threaded programs make use of thread-local state.  This is at odds
with GHC's new extension to support using OS threads to multiplex calls
to blocking foreign functions - this is the extension we call the
"threaded RTS", which is off by default but turned on if you configure
GHC with --enable-threaded-rts.  The threaded-rts extension is important
if you want to call foreign functions that might block - without
thread-rts this would block all the other Haskell threads until the
blocking foreign call returns.

The problems arise because GHC's threaded RTS doesn't make any
distinction between OS threads; as far as it is concerned any OS thread
is as good as any other.  We hadn't considered the use of thread-local
state by external C libraries when we designed this (obviously :-{).

Ok, so what can we do?

1. Swap the thread-local state in
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D

Wolfgang's proposed fix is to allow the right thread-local state to be
swapped in at the right moment, just before running a Haskell thread.  I
don't think this will work in general, because part of the thread-local
state is the thread ID of the OS thread itself, which can't be swapped
in.

Also, Sven pointed out that swapping in the context in the GLUT case can
have other drastic performance implications.

2. Every Haskell thread has its own OS thread
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Some other folk proposed moving to a 1-1 correspondence between Haskell
threads and OS threads.  I think this is a poor solution simply because
of the overhead - Haskell threads are very lightweight (1000s of threads
is entirely reasonable), but OS threads tend to be much heavier.  For
example, I'm sure this would kill the performance of the Haskell web
server.

3. Some Haskell threads have their own OS thread
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Another solution is to fix a 1-1 correspondence between Haskell threads
and OS threads for some Haskell threads only, perhaps selected by a
different version of forkIO.  We think this is implementable, has zero
overhead if you don't use it, but it does require that the user of the
external binding remembers to use the right flavour of forkIO.

Callbacks have to create a new Haskell thread which is bound to the
current OS thread.

Alastair points out that it might be significant which Haskell thread
runs a particular finalizer.

4. Thread groups
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Claus's suggestion is similar, but gives the Haskell programmer more
control over the mapping between OS threads and Haskell threads.  I must
admit I'd been wondering about something similar myself.  He suggests
that every Haskell thread is bound to a specific OS thread, but that
more than one Haskell thread can map to the same OS thread (a thread
group).

This is slightly less convenient for the Haskell programmer - one has to
be careful to fork a new thread group to avoid being blocked by a
foreign call.

------------------

We can afford to discuss this a while longer, because Simon & I are
currently focussed on the next release (I don't want to hold up 5.04 for
a fix, and it wouldn't be a disaster if we had go straight to 5.06 in a
couple of months or so).

Personally I can't decide whether (3) or (4) is the better solution.
I'm pretty sure (1) and (2) aren't viable, though.

Cheers,
	Simon