FFI, safe vs unsafe

Wed Apr 5 08:39:31 EDT 2006

I think the following kinds of foreign calls wrt. concurrency are
sensible:

1. Other Haskell threads might get paused (but don't have to).

   Examples: sqrt, qsort (we assume that qsort never needs a long time
   between calls to the comparison function, so there is no need to
   allow other threads, and it's more important to avoid context
   switching overhead).

2. Other Haskell threads should run if possible, but this is not
   strictly necessary if the implementation doesn't support that.

   Examples: stat, computing md5 of a byte array (the call doesn't
   block for an arbitrarily long time, so pausing other threads is
   acceptable, but with slow hardware or on a multiprocessor it might
   be preferable to allow for more concurrency).

3. Other Haskell threads must run.

   Examples: wait, blocking read (a blocking call; not running Haskell
   threads might lead to deadlocks).

2 is the same as 1 on some implementations, and the same as 3 on others.

3 is possible only in implementations which make use of OS threads.
A foreign call annotated with 3 would be an error in some implementations.

Old GHC, before bound threads, can't support 3. In GHC with bound threads
2 is equivalent to 3. In SMP version even 1 allows some other threads
(but multiple threads doing calls of kidn 1 might stop other threads).

1 or 2 are reasonable defaults.

Variant 1 has two subvariants: allowing callbacks or not. I don't know
whether it makes sense to differentiate this for other variants, i.e.
whether disallowing callbacks allows to generate faster code. Anyway,
if 2 is chosen as the default, I wish to be able to specify variant 1
sans callbacks with a single keyword, like 'unsafe' today, because
it's quite common.

I'm not sure whether 3 should be provided at all. Perhaps when
wrapping a foreign function it's generally not known whether full
concurrency is essential or not, so it's always better to block other
threads than to reject code.

Some functions need to use a different implementation when variant 2
blocks other threads. For example instead of calling a blocking
function, the program might call some its variant with a timeout,
and after the timeout other threads are given a chance to run.
This way the waiting thread uses its timeslices and wakes up the
processor every couple of milliseconds, but at least it works.
In my implementation of my language Kogut I simply don't block
the timer signal in this case.

So I propose to not provide 3, but instead provide a constant which
a program can use to discover whether 2 blocks other threads or not.
Using that constant it can either choose an alternative strategy,
or abort if it knows that 3 would be essential.

Wolfgang Thaller <wolfgang.thaller at gmx.net> writes:

> 1.) Assume thread A and B are running. Thread A makes a non-
> concurrent, reentrant call to Foreign Lands. The foreign function
> calls a foreign-exported Haskell function 'foo'.
> While 'foo' is executing, does thread B resume running?

Yes, when the scheduler chooses it.

> 2.) Assume the same situation as in 1, and assume that the answer to
> 1 is yes. While 'foo' is running, (Haskell) thread B makes a non-
> concurrent, reentrant foreign call. The foreign function calls back
> to the foreign-exported Haskell function 'bar'. Because the answer to
> 1 was yes, 'foo' will resume executing concurrently with 'bar'.
> If 'foo' finishes executing before 'bar' does, what will happen?

There are sensible implementations where the foreign code of thread A
after calling 'foo' continues running, and is running alone, while
thread B is paused until thread A either calls another Haskell
function or returns to Haskell. Bound threads in GHC work like this
I think.

And there are sensible implementations where thread A is paused trying
to return from 'foo', until 'bar' returns. This might lead to a
deadlock if 'bar' will wait for something only thread A can do, but
it's unavoidable if the implementation doesn't use OS threads at all.
I think these implementations coincide with those which are unable to
provide variant 3, so providing the constant I mentioned allows to
distinguish these cases too.

> 3.) Same situation as in 1. When 'foo' is called, it forks (using
> forkIO) a Haskell thread C. How many threads are running now?

Three.

> 4.) Should there be any guarantee about (Haskell) threads not making
> any progress while another (Haskell) thread is executing a non-
> concurrent call?

No. In an implementation which runs every Haskell threads on its
own OS thread, with a concurrent runtime, all foreign calls are
actually concurrent and the modifiers have no effect.

> 5.) Assume that Haskell Programmer A writes a Haskell library that
> uses some foreign code with callbacks, like for example, the GLU
> Tesselator (comes with OpenGL), or, as a toy example, the C Standard
> Library's qsort function. Should Programmer A specify "concurrent
> reentrant" on his foreign import?

If the call can take a long time before entering Haskell, then it
should be annotated with 2. Otherwise with 1.

Sometimes it's impossible to tell beforehand whether the call will
take a long time, e.g. getaddrinfo might return a cached answer
immediately or wait for nameservers to respond. It's an unavoidable
pity. Such call can be annotated with 2, and the overhead will be
wasted sometimes.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/