FFI, safe vs unsafe

Fri Mar 31 19:53:00 EST 2006

On Fri, Mar 31, 2006 at 06:41:18PM -0500, Wolfgang Thaller wrote:
> >I am confused, why would anything in particular need to happen at all?
> >
> >the threads are completly independent.  The non-concurrent calls could
> >just be haskell code that happens to not contain any pre-emption  
> >points
> >for all it cares. in particular, in jhc, non-concurrent foreign  
> >imports
> >and exports are just C function calls. no boilerplate at all in either
> >direction.  calling an imported foreign function is no different than
> >calling one written in haskell so the fact that threads A and B are
> >calling foregin functions doesn't really change anything.
> 
> In an implementation which runs more than one Haskell thread inside  
> one OS thread, like ghc without -threaded or hugs, the threads are  
> NOT completely independent, because they share one C stack. So while  
> bar executes, stack frames for both foreign functions will be on the  
> stack, and it will be impossible to return from foo before bar and  
> the foreign function that called it completes. I think this kind of  
> semantics is seriously scary and has no place as default behaviour in  
> the language definition.

no, state-threads, a la NSPR, state-threads.sf.net, or any other of a
bunch of implementations.

each thread has its own stack, you 'longjmp' between them. it can almost
practically be done in portable C except the mallocing of the new stack,
but there are many free libraries (less a library, more a header file
with some #ifdefs) for all processors out there that support that, or at
least that gcc supports in the first place.

this would be by far the easist way to add concurrency to any haskell
compiler, other than the addition of the 'create a new stack' and
'longjmp' primitives, it can be implemented 100% in the standard
libraries with haskell code. that is why I am confident in saying it
probably won't rule out future implementations we have not thought of
yet. since it is mostly pure haskell anyway.

> If you implement concurrency by using the pthreads library, you need  
> to either make sure that only one thread mutates the heap at a time,  
> or deal with SMP. In either case, concurrent foreign calls would be  
> trivial.

indeed. but pthreads has its own tradeoffs. there is certainly room for
both types of haskell implementations. 

> >>4.) Should there be any guarantee about (Haskell) threads not making
> >>any progress while another (Haskell) thread is executing a non-
> >>concurrent call?
> >
> >I don't understand why we would need that at all.
> 
> Good. Neither do I, but in the discussions about this issue that we  
> had three years ago several people seemed to argue for that.

wacky. I can't think of a reason, it would be quite tricky to pull off
with a fully pthreaded implementation anyway. 

> >>5.) [...] So what
> >>should the poor library programmer A do?
> >
> >He should say just 'reentrant' since concurrent isn't needed for
> >correctness because the tessalation routines are basic calculations  
> >and
> >will return.
> 
> Let's say they will return after a few minutes. So having them block  
> the GUI is a show-stopper for programmer C.
> And if programmer C happens to use a Haskell implementation that  
> supports "concurrent reentrant" but also a more efficient "non- 
> concurrent reentrant", he will not be able to use the library.

well, I think he has a choice to make there about what is more important
to him. I admit, it has to be a judgement call at some point, as
eventually performance problems become correctness ones.

but perhaps this is an argument for a concurrent-hint flag, "make this
concurrent and reentrant if possible, but its gonna be reentrant anyway
no matter what"

I mean, one could bend the rules any say coooperative systems do
implement "concurrent reentrant" with just an incredibly crappy
scheduling algorithm, but I think I'd rather have it fail outright than
"pretend". 

but a 'concurrent-hint' flag could be useful, as a library writer may
not know the preference of his user.

a completely different solution would be just to foreign import the
routine twice, with each convention and have some way for the user of a
library to choose which one they want, perhaps with a flag. of course,
both might not be available with all implementations.

in any case, I don't think it is a showstopper.

> >everyone wins. in the absolute worst case there are always #ifdefs  
> >but I
> >doubt they will be needed.
> 
> Except for programmer C on some haskell implementations. I don't buy  
> it yet :-).

Well, certain implementations will always have their own extensions that
people might rely on. I just don't want the language standard itself to
rule out valid and useful implementation methods. Haskell with IO
multiplexing is a very powerful platform indeed and this proposal lets
us keep it in the language proper and that is very nice, from an
implementor and a library writers point of view. often concurrent
haskell is just a nicer way to express things and being able to use that
expresivity in portable libraries is something I look forward to.

> >>6.) Why do people consider it too hard to do interthread messaging
> >>for handling a "foreign export" from arbitrary OS threads, when they
> >>already agree to spend the same effort on interthread messaging for
> >>handling a "foreign import concurrent"? Are there any problems that I
> >>am not aware of?
> >
> >it is not that it is hard (well it is sort of), it is just absurdly
> >inefficient and you would have no choice but to pay that price for
> >_every_ foregin export. even when not needed which it mostly won't be.
> >the cost of a foreign export should be a simple 'call' instruction
> >(potentially) when an implementation supports that.
> 
> As we seem to agree that the performance issue is non-existant for  
> implementations that use one OS thread for every haskell thread, and  
> that we don't want to change how GHC works, the following refers to a  
> system like hugs where all Haskell code and the entire runtime system  
> always runs in a single OS thread.
> 
> It might not be absolutely easy to implement "concurrent reentrant",  
> but it's no harder than concurrent non-reentrant calls. If a haskell  
> implementation has a hacker on its team who is able to do the former,  
> then this is no problem either.
> As for the efficiency argument: if it is sufficiently slow, then that  
> is an argument for including "nonconcurrent reentrant" as an option.  
> It is not an argument for making it the default, or for leaving out  
> "concurrent reentrant".

it is much much harder. you have to deal with your haskell run-time
being called into from an _alternate OS thread_ meaning you have to deal
with the os threading primitives and locking and mutexi and in general
pay a lot of the cost you would for a fully OS threaded implementation.

foreign concurrent nonreentrant imports can be implemented in pure
haskell with just some FFI calls. 

concurrent reentrant requires changes to the run-time model and imposes
a cost to _every_ foreign export.

> >the cost of a foreign import concurrent nonreentrant is only paid when
> >actually using such a function, and quite cheap. on linux at least, a
> >single futex, a cached pthread and it gets rolled into the main event
> >loop. so a couple system calls max overhead.
> 
> Sure. But what gives you the idea that the cost of a foreign export  
> or a foreign import concurrent reentrant would be paid when you are  
> not using them?
> If we include nonconcurrent reentrant foreign imports in the system,  
> or if we just optimise foreign imports for the single-threaded case,  
> all that the foreign export would have to do is to check a flag (NO  
> system calls involved). If the callback is from a foreign import  
> concurrent reentrant or if it is from an entirely Haskell-free C  
> thread, then we will have to do an inter-thread RPC to the runtime  
> thread. Unavoidable.

the flag will have to be thread-local as even in cooperative systems
multiple forigin calls can be running at once. and it would be a
run-time check on every haskell-C transition. in any case, it is a
run-time cost.

Nothing precludes implementations from providing a 'concurrent
reentrant' mode if they really want to (and document it properly) in
fact, it is required of implementatios that support OS threads in the
current proposal. but if you are cooperative anyway it would be a lot of
work for little gain. if you write your program such that it will work
on a cooperative system, then you already deal with the fact things
won't be running at the same time, and if you write your code expecting
things to run at the same time, then you are guarenteed 'concurrent
reentrant' anyway (assuming the implementation follows all of the OS
threading option)

> For Hugs, I guess that overhead would be absorbed in its general  
> slowness. For Yhc, it might be an issue.

for jhc it would be a very big issue. jhc compiles to straight C, as in
foreign calls are literally just C calls with no more overhead at all.
it would probably be so for any other implementations that compile to
'straight c' (or c--). for a suitable definition of straght.

> A related performance sink are all foreign imports if such an  
> implementation supports bound threads (which are, after all, needed  
> to use some libraries, like OpenGL and Carbon/Cocoa, from a multi- 
> threaded program). If the foreign function needs to be executed in a  
> dedicated thread, then even a nonconcurrent nonreentrant call would  
> involve inter-thread messaging (in this hypothetical hugs+bound  
> threads). We should consider adding a "nothreadlocal" attribute to  
> foreign imports - when it is known that the foreign function does not  
> access thread-local state, we can use the traditional, more efficient  
> implementation for "foreign import nonconcurrent nonreentrant".

Yeah, I left mention of bound threads out of the basic standard because
I don't want to say anything about OS threads in the required standard. but
perhaps we should say something on the issue.

"if an implementation supports haskell code running on multiple OS
threads, it must support the bound threads proposal. if it does not,
then all 'nonconcurrent' foreign calls must be made on the one true OS
thread"

would be good enough for me.

it does mean 'concurrent' foreign calls on a non-OS-threaded
implementations cannot use TLS (well, the same TLS as each other that
is), but that seems reasonable, as the only way to concurrentize an
arbitrary C function we know nothing about is to threadize it so I don't
think there actually is a solution, you just can't run things
concurrently that also need to be on the same thread :)

        John

-- 
John Meacham - ⑆repetae.net⑆john⑈