[Haskell-cafe] Re: Bound threads

Thu Mar 3 06:12:12 EST 2005

Wolfgang Thaller <wolfgang.thaller at gmx.net> writes:

>> Indeed, my brain is melting, but I did it :-)
>
> Congratulations. How about we found a "Bound-thread-induced brain
> melt victims' support group"?

The melt was entertaining :-)

> Besides simplicity, one of the main reasons for moving our select()
> call from the run-time system to the libraries was to avoid the
> performance hit of having to call select() every time through the
> scheduler loop rather than only once per IO operation.

I use epoll when available. It's Linux-specific and allows to register
and unregister descriptors separately from waiting. This not only
saves process time to set up the array, but also kernel time scanning
the array and hooking to files.

I've heard BSD kqueue mechanism has similar properties.

I unregister descriptors from epoll "lazily": when epoll returned
that data is available but no thread was in fact waiting for it. This
saves repeated registration when a thread alternates between I/O and
computation.

When the scheduler determines that it has no thread to wake up
immediately, it performs a GC before going to wait if the program
did roughly at least half of work until the next normal GC.

> Imagine having one or more (unbound) threads that spend most of their
> time waiting for IO, and a bunch of (also unbound) threads that do
> some computation. If select() is part of the scheduler loop, you will
> get a select() call whenever a thread-switch between the computation
> threads happens.

Actually once the next thread in the "running and I/O" queue is an I/O
thread, not in every scheduler iteration. Or more precisely a consecutive
span of I/O threads in this queue.

epoll_wait takes 0.2 us here, poll takes 1 us, select takes 0.6 us
(1 descriptor in each case). I wonder why poll is slower than select.

I was thinking about integration with gtk/glib event loop. There are
two ways: either we ask glib to poll using a function supplied by us,
or we perform polling using glib functions instead of raw epoll / poll
/ select. The first choice seems better because otherwise callbacks
registered at glib were started from the scheduler and this will not
work, it's even not clear on behalf of which thread they should run.
In this case we must provide a function with an interface of poll().

Without additional support in the runtime (other than making file
objects which don't close their underlying file, but this is easy),
the function can be implemented by starting a thread for each
descriptor, collecting the results, and cancelling threads when some
descriptor is ready or when the timeout expires. Let's assume that
real poll is used by our scheduler and that no other thread does I/O
at the moment, and see what really happens:
- the threads are created at the end of the run queue
- other threads in the program execute their time slices
- each of the newly created threads is marked as waiting for I/O
- other threads in the program execute again (ugh)
- the scheduler looks at the first I/O thread and makes poll()
- all threads whose I/O is ready are woken up
- the next running thread is chosen (perhaps one of threads woken up
  in the previous step)
- it notifies the manager thread that glib-poll-emulation is ready
- when execution reaches the manager, it kills other threads and
  reports the result to glib

It seems that other than a bunch of context switches there is not much
work besides the required minimum. (It gets worse wich epoll, which is
suitable for a mostly unchanging set of watched descriptors.)

With GHC implementation I think each thread which adds a descriptor
will wake up the service thread through a pipe, and later they will
wake it up again to unregister files when they become cancelled.

>> All threads except the thread performing the fork become unbound.
>> [...]
>
> What happens when fork is called from an unbound thread? Does it
> become bound in the child process?

No. But in ForkProcess and ForkProcessKillThreads this thread plays
the role of the main thread: it receives Unix signals; it receives
internal asynchronous signals like heap overflow and deadlock; when it
terminates, the process terminates; if it terminates with an unhandled
exception, a handler which normally prints the stack trace is called.
AtExit handlers are not run though.

I don't know what should be done, this got quite hairy. Actually the
above termination semantics currently applies only when fork is called
with an I/O action as an argument. It can also be called without,
like C fork(), and in this case the behavior is ugly: if the new main
thread in the child process was not the main thread before, its
termination is not special and there will be a deadlock. This should
be changed somehow.

In any case, when the previous main thread terminates (as it's
cancelled cleanly by ForkProcess), it checks whether it's still the
main thread. If not, it disappears like normal threads, except that
its OS thread will wait on a condition variable forever.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/