FFI: number of worker threads?

Thu Jun 22 06:07:09 EDT 2006

Li, Peng wrote:
> On 6/21/06, Simon Peyton-Jones <simonpj at microsoft.com> wrote:
> 
>> New worker threads are spawned on as needed.  You'll need as many of
>> them as you have simultaneously-blocked foreign calls. If you have 2000
>> simultaneously-blocked foreign calls, you'll need 2000 OS threads to
>> support them, which probably won't work.
> 
> 
> 2000 OS threads definitely sound scary, but it is possible to work.
> The Linux NPTL threads can scale well up to 10K threads and the stack
> address spaces would be sufficient on 64-bit systems.
> 
> I am thinking about some p2p applications where each peer is
> maintaining a huge amount of TCP connections to other peers, but most
> of these connections are idle. Unforturnately the default GHC RTS is
> multiplexing I/O using "select", which is O(n) and it seems to have a
> FDSET size limit of 1024.
> 
> That makes me wonder if the current design of the GHC RTS is optimal
> in the long run. As software and hardware evolves, we will have
> efficient OS threads (like NPTL)  and huge (64-bit) address spaces.
> My guess is
> 
> (1) It is always a good idea to multiplex GHC user-level threads on OS
> threads, because it improve performance.
> (2) It may not be optimal to multiplex nonblocking I/O inside the GHC
> RTS, because it is unrealistic to have an event-driven I/O interface
> that is both efficient (like AIO/epoll) and portable (like
> select/poll). What is worse, nonblocking I/O still blocks on disk
> accesses. On the other hand, the POSIX threads are portable and it can
> be efficiently implemented on many systems. At least on Linux, NPTL
> easily beats "select"!
> 
> My wish is to have a future GHC implementation that (a) uses blocking
> I/O directly provided by the OS, and (b) provides more control over OS
> threads and the internal worker thread pool.  Using blocking I/O will
> simplify the current design and allow the programmer to take advantage
> of high-performance OS threads. If non-blocking I/O is really needed,
> the programmer can use customized, Claessen-style threads wrapped in
> modular libraries---some of my preliminary tests show that
> Claessen-style threads can do a much better job to multiplex
> asynchronous I/O.

I've read your paper, and I expect many others here have read it, too. 
The results are definitely impressive.

Ultimately what we want to do is to use a more flexible I/O library (eg. 
streams) that lets you choose the low-level I/O mechanism for each 
individual stream while maintaining the same high-level interface.  If 
you want to use blocking I/O and OS threads to do the multiplexing, then 
you could do that.  Similarly, if you want to use epoll underneath, then 
we should provide a way to do that.  I imagine that most people will 
want epoll (or equivalent) by default, because that will give the best 
performance, but we'll have the portable fallback of OS threads if the 
system doesn't have an epoll equivalent, or we haven't implemented it.

Am I right in thinking this will address your concerns?  Mainly you are 
worried that you have no choice but to use the supplied select() 
implementation on Unix systems, right?

There's one further advantage to using epoll as opposed to blocking 
read/write: the extra level of indirection means that the runtime can 
easily interrupt a thread that is blocked in I/O, because typically it 
will in fact be blocked on an MVar waiting to be unblocked by the thread 
performing epoll.  It is much harder to interrupt a thread blocked in an 
OS call.  (currently on Windows where we currently use blocking 
read/write, throwTo doesn't work when the target thread is blocked on 
I/O, whereas it does work on Unix systems where we multiplex I/O using 
select()).  This means that you can implement blocking I/O with a 
timeout in Haskell, for example.

Cheers,
	Simon