[Haskell-cafe] Re: standard poll/select interface

Thu Feb 23 19:15:05 EST 2006

Simon Marlow <simonmarhaskell at gmail.com> writes:

> I think the reason we set O_NONBLOCK is so that we don't have to test
> with select() before reading, we can just call read().  If you don't
> use O_NONBLOCK, you need two system calls to read/write instead of
> one. This probably isn't a big deal, given that we're buffering anyway.

I've heard that for Linux sockets select/poll/epoll might say that
data is available where it in fact is not (it may be triggered by
socket activity which doesn't result in new data). Select/poll/epoll
are designed to work primarily with non-blocking I/O.

In my implementation of my language pthreads are optionally used in
the way very similar to your paper "Extending the Haskell Foreign
Function Interface with Concurrency". This means that I have a choice
of using blocking or non-blocking I/O for a given descriptor, both
work similarly, but blocking I/O takes up an OS thread. Each file
has a blocking flag kept in its data.

A non-blocking I/O is done in the same thread. The timer signal is
kept active, so if another process has switched the file to blocking,
it will be woken up by the timer signal and won't block the whole
process. The thread performing the I/O will only waste its timeslices.

A blocking I/O temporarily releases access to the the runtime, setting
up a worker OS thread for other threads if needed etc. As an
optimization, if there are no other threads to be run by the scheduler
(no running threads, nor waiting for I/O, nor waiting for a timeout,
and we are the thread which handles system signals), then runtime is
not physically released (no worker OS threads, no unlinking of the
thread structure), only the signal mask is changed so the visible
semantics is maintained. This is common to other such potentially
blocking system calls. I don't know if GHC does something similar.

(I recently made it working even if a thread that my runtime has not
seen before wants to access the runtime. If the optimization of not
physically releasing the runtime was in place, the new thread performs
the actions on behalf of the previous thread.)

In either case EAGAIN causes the thread to block, asking the scheduler
to wake it up when I/O is ready. This means that even if some other
process has switched the file to non-blocking, the process will only
do unnecessary context switches.

It's important to make this working when the blocking flag is out
of sync. The Unix blocking flag is not even associated with the
descriptor but with an open file, i.e. it's shared with descriptors
created by dup(), so it might be hard to predict without asking the
OS.

If pthreads are available, stdin, stdout and stderr are kept blocking,
because they are often shared with other processes, and making them
blocking works well. Without pthreads they are non-blocking, because
I felt it was more important to not waste timeslices of the thread
performing I/O than to be nice to other processes. In both cases pipes
and sockets are non-blocking, while named files are blocking. The
programmer can change the blocking state explicitly, but this is
probably useful only when setting up redirections before exec*().

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/