Safe FFI and Blocking IO

Tue Dec 4 19:07:22 UTC 2018

Hello Andrew,

I have done some work on interruptibility and accidental blocking in GHC and Unix over the last 2 years with FP Complete for our clients.

Summarising from what was already written/linked, the key things to understand are:

* `safe` calls run in a separate OS thread in -threaded, so they protect from any blocking.
* The separate threads spawned by `safe` calls do not count to the +RTS -N limit.
* `unsafe` calls block the entire capability, always (e.g. 1 out of 4 +RTS -N4 threads).
* There is only one way to interrupt running system calls on Unix: Sending the thread that does them a signal. The syscalls then return an error and `errno = EINTR`. Many (but not all) syscalls can be interrupted that way.
* `interruptible` is thus implemented by sending a signal to the thread that does the syscall.
* That happens in particular when you send an exception via `throwTo` to a Haskell thread that's blocked in a foreign call (for example, `timeout` uses `throwTo`).
* You can only use `interruptible` on FFI code that is written on purpose to return back to Haskell when EINTR is encountered, so that Haskell can then raise the exception. If the code doesn't do that, but instead just retries the syscall in C, then there's no point in using `interruptible`, as it won't have any effect.

Important for non-threaded is:

* In non`-threaded`, behaviour varies a lot across platforms.
* On Linux it really has only a single thread. Some things happen to be more interruptible on Linux because the timer signal wakes up all kinds of syscalls regularly, so most things work like `interruptible` is implemented on Linux.
* On e.g. OSX, non-threaded actually uses threads, namely 2: One for the timer signal, and one for the Haskell stuff.
* These differences make it very difficult to expect similar behaviour from the non-threaded runtime across platforms.
* I have an open proposal + half-done implementation to make non-threaded on Linux work like it does on OSX to unify these things. https://phabricator.haskell.org/D42#128355

The key rules are:

* Do not use `unsafe` on anything that can block on non-CPU-bound tasks, ever. It will massively limit the ability to use multiple cores.
* Use `unsafe` only for CPU-bound activities.
* For all other things, `interruptible` is the best of all, but as mentioned above, the called code must be designed do return EINTR all the way up to Haskell.
* Where this is not the case and you thus can't use `interruptible`, use `safe`.

The `unix` package unfortunately uses `unsafe` calls in many places where it really shouldn't, such as `stat()` (see the ticket you linked).
I think this is very bad and we must fix it.
For some of my own tools (like a parallel file-copy tool designed to work well on network file systems), I use a fork of the package where everything uses `safe`.

For many details on these topics, check out the tickets I filed / worked on:

* https://ghc.haskell.org/trac/ghc/ticket/8684 - hWaitForInput cannot be interrupted by async exceptions on unix 
* https://ghc.haskell.org/trac/ghc/ticket/13497 - GHC does not use select()/poll() correctly on non-Linux platforms 
* https://ghc.haskell.org/trac/ghc/ticket/15153 - GHC uses O_NONBLOCK on regular files, which has no effect, and blocks the runtime

Also happy to answer any questions!

Niklas