FFI, safe vs unsafe

Tue Apr 4 00:03:56 EDT 2006

John Meacham wrote (... but I've reordered things):

> My only real 'must-have' is that the 4 modes all can be explicitly and
> unambiguously specified. I have opinions on the syntax/hints but  
> that is
> more flexable.

I basically agree (the syntax discussion will take place in the years  
after the semantics discussion), but...

I want programmers to have a way of saying "this function might spend  
a lot of time in foreign lands". These calls should be concurrent on  
all implementations that support it (because some separately  
developed/maintained piece of Haskell code might expect to run a  
computation in the background), but if there are implementations that  
don't support it shouldn't flag an error, because that would  
encourage library writers to specify nonconcurrent when they can't  
prove that it's safe, or make code needlessly nonportable.
Another way to look at it: You cannot decide whether the call  
actually has to be done concurrently by just looking at the call site  
- you'd need to look at the entire program, and asking people  
(especially library writers) to state and guarantee global properties  
of a program that might not even be finished yet is a Bad Thing.  
Therefore, the concurrency annotation on the foreign import can only  
be a hint on whether the foreign function is guaranteed to return  
quickly or not; the actual requirement for the call to be  
"concurrent" is hidden in the other code that expects to run at the  
same time. Therefore, it would be wrong for an implementation that  
doesn't support concurrent calls (reentrant or nonreentrant, I don't  
care) to flag an error; the foreign import declaration correctly  
refuses to give a guarantee that the function will return quickly.  
The error is in the code that expects to run concurrently with a  
foreign import on an implementation that doesn't support that (but of  
course, a compiler can't detect such an error).

> Another nice minor thing would be if haskell implementations were
> required to ignore annotations starting with 'x-' for implementation
> specific hints.

Sounds good. Syntax discussion postponed again ('x-' looks so mime- 
typish. Could we add a meaningless 'application/' to the front? Just  
kidding).

> In my survey of when 'reentrant concurrent' was needed, I looked at  
> all
> the standard libraries and didn't find anywhere it was actually  
> needed.
> Are there some compelling examples of when it is really needed in a
> setting that doesn't have OS threads to begin with? (I am not  
> asserting
> they don't exist, I just want to see some example uses of this feature
> to get a better handle on the implementation cost)

In my experience, reentrant calls are rare in non-GUI code, but they  
become quite common in GUI code (OK, in some GUI libraries, there is  
only one, called something like RunMainEventLoop, but then it runs  
almost all of the time and is absolutely crucial). And with most GUI  
libraries, the GUI's main event loop will refuse to cooperate well  
with a Haskell's implementation's scheduler, so it will need to be  
called as a "concurrent" foreign import if your application is to do  
any background processing while waiting for events.
Other libraries that rely on callbacks would include the GLU  
Tesselator that I already mentioned, as well as several packages for  
solving optimisation problems. For those, concurrency would probably  
only become an issue when they are used with a GUI (even if it's only  
to display a progress bar).
Another reason why you don't see them in Haskell standard library  
code might be that everyone prefers Data.List.sort to foreign import  
ccall qsort.

> Any particular reason hugs and GHC didn't use the state-threads  
> approach
> out of curiosity? did it conflict with the push-enter model?  (jhc  
> uses
> the eval-apply model so I am more familier with that)

It was before my time. I guess it's because GHC uses a separate heap- 
allocated Haskell thread, so it made sense not to bother to allocate  
a separate C stack for every one of them. Don't know about Hugs.

> It also implys that a function call will run on the same OS thread as
> the OS thread the current haskell thread is running on.

This shouldn't be put into a standard, as the bound threads proposal  
already gives a different guarantee about that, and both guarantees  
taken together probably guarantee too much - taken together, they  
probably mean every Haskell thread has to be an OS thread. It might  
be an implementation-specific guarantee, unless the bound threads  
become a part of the standard in their entirety.

> 'OS thread the current haskell
> thread is running on' (GHC already doesn't when bound threads arn't  
> used
> I am led to believe?)

There should be no such thing as the 'OS thread the current haskell  
thread is running on' in any standard; OS thread identity is only  
observed through the FFI.

>  this means that things like 'log' and 'sin' and
> every basic operation goes through the FFI mechanism so it needs to be
> _fast_ _fast_. A neat side effect is that jhcs implementation of the
> prelude is mostly portable to different compilers.

I, too, want foreign import nonconcurrent nonreentrant to compile to  
a plain call without any extras. GHC achieves that goal, even in the  
presence of bound threads; I'm optimistic about jhc + state threads,  
too.

==== Bound Threads / Implementation methods ====

> I am not quite sure whether you are saying something different from  
> what
> I plan for jhc or not, my current thinking for jhc is,
> [...]
> An alternate mode I'd like to experiment with one day is the complete
> oposite side of the spectrum:
>
> one OS thread per haskell thread, no guarentees about duplicated work
> between threads.

Let me add a third one:
One OS thread per haskell thread, locks & condition variables used to  
make them behave just like state threads.
Creating threads and switching from one thread to another would be  
slower than with state threads by about the time it takes to do a  
trip to the kernel and back. Nonconcurrent foreign imports are just  
plain calls, concurrent foreign imports have to release the lock (and  
maybe signal another thread) and then re-acquire it afterwards.  
Foreign exports need to check whether the current OS thread owns the  
lock (TLS access), and wait for the lock if it doesn't.

And a fourth one:
Your "single true OS thread" runs all *unbound* Haskell threads and  
your scheduler (on separate stacks). Bound threads run in their own  
OS threads, and the scheduler will use OS thread primitives (foreign  
imported nonconcurrent) to run them and to wait for their time slice  
to finish (if you're doing preemption) or for them to block. It's  
like "if target thread is bound, use OS thread primitives, else use  
State Thread primitives to pass control to it". Foreign import  
nonconcurrent is a plain call. Foreign import concurrent from a bound  
thread releases the lock before calling (and waits for it  
afterwards). From an unbound thread you could implement it the way  
you planned to.
Call-ins would need that one stack pointer range check, and wait for  
a lock if it fails.

> well, the cost of bound threads is not the cost of the call itself, it
> is that they must be serialized. foreign concurrent calls can run
> concurrently to haskell code and each other. but 'bound' foreign calls
> must wait for their dedicated thread to become available. I think  
> there
> needs to be an annotation as to which functions require boundness so
> suddenly all foreign calls arn't serialized just because they are in a
> 'forkOS'ed thread.

I think you're mistaken here. For every given OS thread, *at most  
one* Haskell thread will ever be bound to it. So when you make a  
foreign call, you can be sure that the OS thread that is supposed to  
execute it is available *right now*, because the only Haskell thread  
that has the right to cause code to be executed in the bound OS  
thread has obviously finished any previous foreign calls. No need to  
wait. Ever.
In the bound threads proposal, the only basic method to create a  
bound thread is a call to a foreign export (or to main). The  
resulting thread of Haskell execution is bonund to the OS thread that  
made the call. The library function forkOS is just a call to  
pthread_create or it's Windows equivalent.
It would be possible to add an annotation to "foreign export" that  
states that "the Haskell thread that results from a call-in to this  
function does not need to be bound" (if that would improve  
performance), but back when "bound threads" were born we decided  
against it to keep things simple; instead, we added  
"runInUnboundThread :: IO a -> IO a" to keep the people who were  
concerned about performance happy, and I have yet to see it used by  
anyone.

>> ==== On the implementability of "concurrent reentrant" ====
>> [...] b) The runtime thread will need to periodically check  
>> whether an
>> interthread message has arrived, and if there is no work, block
>> waiting for it. The fast path of checking whether something has been
>> posted to the message queue is fast indeed - you just have to check a
>> global flag.

> I'd integrate it into the EDSM loop somehow (futex maybe) as I have a
> moral adversion for periodic checking of anything.

Yes, I was thinking of checking the flag just as a way to avoid  
calling epoll if a foreign call is already waiting. Basically, you'll  
want to write a message to a pipe so that it gets picked up by the  
EDSM loop. After all, a pipe is nothing but a traditional unix-style  
message queue :-).

> the main thing is that it is a cost paid by every foreign export.
> perhaps a flag saying "this will only be called nonconcurrently on
> exports" though, perhaps that can be an x-flag if other compilers  
> can't
> take advantage of it.

Yes, such an x-flag would be entirely reasonable if those few  
instructions for a stack pointer range test become significant. Those  
functions will, of course, be very rare.

> how prevelant is support for __thread BTW? is it required by any
> standards or an ELFism?

It is an ELFism by birth; __declspec(thread) is the equivalent  
MSVCism, and I don't know of any other equivalent features (but then  
my horizon doesn't extend much beyond the Mac/Linux/Windows triad). I  
hope that Apple and the mingw32 team soon implement it, too, but  
that's just an unfounded hope.

That's all for now,

Cheers,

Wolfgang