How is non-blocking IO working ?

Alastair Reid
Thu, 03 Apr 2003 18:05:22 +0100

Ahn Ki-yung <> writes:

> Document says GHC uses non-blocking IO to be thread friendly. 
> [...]
> How is non-blocking and thread working ?  I can't understand.

This is usually implemented using 'select' (see attached man page).

In a multithreaded program where you have user-level threads (which
is, effectively, what GHC's threads provide), you typically maintain a
global set of all file descriptors that a thread is trying to read
from.  There are then two cases:

1) If there are no runnable threads, call 'select' on all threads
   using a NULL timeout.

2) If there are runnable threads, you can use 'select' with a timeout
   of 0 to poll for readable file descriptors every N seconds.  The
   number N should be chosen low enough to have low latency and high
   enough to have low overhead.

An alternative that works on some operating systems is to use
asynchronous I/O calls.  The problem is that this isn't as portable as
using select.

Hope this helps.

Alastair Reid         
Reid Consulting (UK) Limited

SELECT(2)                  Linux Programmer's Manual                 SELECT(2)

       select,  pselect,  FD_CLR,  FD_ISSET, FD_SET, FD_ZERO - synchronous I/O

       /* According to POSIX 1003.1-2001 */
       #include <sys/select.h>

       /* According to earlier standards */
       #include <sys/time.h>
       #include <sys/types.h>
       #include <unistd.h>

       int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
       struct timeval *timeout);

       int   pselect(int   n,   fd_set   *readfds,  fd_set  *writefds,  fd_set
       *exceptfds, const struct timespec *timeout, const sigset_t *sigmask);

       FD_CLR(int fd, fd_set *set);
       FD_ISSET(int fd, fd_set *set);
       FD_SET(int fd, fd_set *set);
       FD_ZERO(fd_set *set);

       The functions select and pselect wait for a number of file  descriptors
       to change status.

       Their function is identical, with three differences:

       (i)    The  select  function  uses  a  timeout that is a struct timeval
              (with seconds and microseconds), while  pselect  uses  a  struct
              timespec (with seconds and nanoseconds).

       (ii)   The select function may update the timeout parameter to indicate
              how much time was left. The pselect  function  does  not  change
              this parameter.

       (iii)  The  select  function  has  no sigmask parameter, and behaves as
              pselect called with NULL sigmask.

       Three independent sets of descriptors are  watched.   Those  listed  in
       readfds will be watched to see if characters become available for read-
       ing (more precisely, to see if a read will not block - in particular, a
       file  descriptor  is also ready on end-of-file), those in writefds will
       be watched to see if a write will not block,  and  those  in  exceptfds
       will  be  watched  for  exceptions.   On exit, the sets are modified in
       place to indicate which descriptors actually changed status.

       Four macros are provided to manipulate the sets.  FD_ZERO will clear  a
       set.   FD_SET  and  FD_CLR add or remove a given descriptor from a set.
       FD_ISSET tests to see if a descriptor is part of the set; this is  use-
       ful after select returns.

       n  is the highest-numbered descriptor in any of the three sets, plus 1.

       timeout is an upper bound on the amount of time elapsed  before  select
       returns. It may be zero, causing select to return immediately. (This is
       useful for polling.) If timeout is NULL (no timeout), select can  block

       sigmask  is  a  pointer to a signal mask (see sigprocmask(2)); if it is
       not NULL, then pselect first replaces the current signal  mask  by  the
       one  pointed  to  by sigmask, then does the `select' function, and then
       restores the original signal mask again.

       The idea of pselect is that if one wants to wait for an event, either a
       signal  or  something on a file descriptor, an atomic test is needed to
       prevent race conditions. (Suppose the signal handler sets a global flag
       and  returns.  Then  a  test  of this global flag followed by a call of
       select() could hang indefinitely if the signal arrived just  after  the
       test but just before the call. On the other hand, pselect allows one to
       first block signals, handle the signals that have come  in,  then  call
       pselect()  with  the  desired sigmask, avoiding the race.)  Since Linux
       today does not have a pselect() system call, the current glibc2 routine
       still contains this race.

   The timeout
       The time structures involved are defined in <sys/time.h> and look like

              struct timeval {
                  long    tv_sec;         /* seconds */
                  long    tv_usec;        /* microseconds */


              struct timespec {
                  long    tv_sec;         /* seconds */
                  long    tv_nsec;        /* nanoseconds */

       Some  code  calls  select with all three sets empty, n zero, and a non-
       null timeout as a fairly portable way to sleep  with  subsecond  preci-

       On Linux, the function select modifies timeout to reflect the amount of
       time not slept; most other implementations do not do this.  This causes
       problems  both  when  Linux code which reads timeout is ported to other
       operating systems, and when code is  ported  to  Linux  that  reuses  a
       struct  timeval  for  multiple selects in a loop without reinitializing
       it.  Consider timeout to be undefined after select returns.

       On success, select and pselect return the number  of  descriptors  con-
       tained in the descriptor sets, which may be zero if the timeout expires
       before anything interesting happens.  On error,  -1  is  returned,  and
       errno  is  set appropriately; the sets and timeout become undefined, so
       do not rely on their contents after an error.

       EBADF  An invalid file descriptor was given in one of the sets.

       EINTR  A non blocked signal was caught.

       EINVAL n is negative.

       ENOMEM select was unable to allocate memory for internal tables.

       #include <stdio.h>
       #include <sys/time.h>
       #include <sys/types.h>
       #include <unistd.h>

       main(void) {
           fd_set rfds;
           struct timeval tv;
           int retval;

           /* Watch stdin (fd 0) to see when it has input. */
           FD_SET(0, &rfds);
           /* Wait up to five seconds. */
           tv.tv_sec = 5;
           tv.tv_usec = 0;

           retval = select(1, &rfds, NULL, NULL, &tv);
           /* Don't rely on the value of tv now! */

           if (retval)
               printf("Data is available now.\n");
               /* FD_ISSET(0, &rfds) will be true. */
               printf("No data within five seconds.\n");

           return 0;

       4.4BSD (the select  function  first  appeared  in  4.2BSD).   Generally
       portable  to/from  non-BSD  systems supporting clones of the BSD socket
       layer (including System V variants).  However, note that the  System  V
       variant  typically  sets  the timeout variable before exit, but the BSD
       variant does not.

       The pselect function is defined in IEEE  Std  1003.1g-2000  (POSIX.1g),
       and  part  of  POSIX  1003.1-2001.   It is found in glibc2.1 and later.
       Glibc2.0 has a function with this name, that however does  not  take  a
       sigmask parameter.

       Concerning  prototypes,  the  classical  situation  is  that one should
       include <time.h> for select.  The POSIX 1003.1-2001 situation  is  that
       one  should  include  <sys/select.h> for select and pselect.  Libc4 and
       libc5 do not have a <sys/select.h> header; under glibc  2.0  and  later
       this header exists.  Under glibc 2.0 it unconditionally gives the wrong
       prototype for pselect, under glibc  2.1-2.2.1  it  gives  pselect  when
       _GNU_SOURCE  is  defined,  under  glibc  2.2.2-2.2.4  it  gives it when
       _XOPEN_SOURCE is defined and has a value of 600 or larger.   No  doubt,
       since POSIX 1003.1-2001, it should give the prototype by default.

       For a tutorial with discussion and examples, see select_tut(2).

       For vaguely related stuff, see accept(2), connect(2), poll(2), read(2),
       recv(2), send(2), sigprocmask(2), write(2)

Linux 2.4                         2001-02-09                         SELECT(2)