[Haskell-cafe] How to ensure code executes in the context of a specific OS thread?

Wed Jul 6 18:31:34 CEST 2011

2011/7/6 Gábor Lehel <illissius at gmail.com>:
> On Wed, Jul 6, 2011 at 5:24 PM, Jason Dagit <dagitj at gmail.com> wrote:
>> On Wed, Jul 6, 2011 at 8:09 AM, Simon Marlow <marlowsd at gmail.com> wrote:
>>> On 06/07/2011 15:42, Jason Dagit wrote:
>>>>
>>>> On Wed, Jul 6, 2011 at 2:23 AM, Simon Marlow<marlowsd at gmail.com>  wrote:
>>>>>
>>>>> On 06/07/2011 07:37, Jason Dagit wrote:
>>>>>>
>>>>>> On Jul 5, 2011 1:04 PM, "Jason Dagit"<dagitj at gmail.com
>>>>>> <mailto:dagitj at gmail.com>>  wrote:
>>>>>>  >
>>>>>>  >  On Tue, Jul 5, 2011 at 12:33 PM, Ian Lynagh<igloo at earth.li
>>>>>> <mailto:igloo at earth.li>>  wrote:
>>>>>>  >  >  On Tue, Jul 05, 2011 at 08:11:21PM +0100, Simon Marlow wrote:
>>>>>>  >  >>
>>>>>>  >  >>  In GHCi it's a different matter, because the main thread is
>>>>>> running
>>>>>>  >  >>  GHCi itself, and all the expressions/statements typed at the
>>>>>> prompt
>>>>>>  >  >>  are run in forkIO'd threads (a new one for each statement, in
>>>>>> fact).
>>>>>>  >  >>  If you want a way to run command-line operations in the main
>>>>>> thread,
>>>>>>  >  >>  please submit a feature request.  I'm not sure it can be done,
>>>>>> but
>>>>>>  >  >>  I'll look into it.
>>>>>>  >  >
>>>>>>  >  >  We already have a way: -fno-ghci-sandbox
>>>>>>  >
>>>>>>  >  I've removed all my explicit attempts to forkIO/forkOS and passed
>>>>>> the
>>>>>>  >  command line flag you mention.  I just tried this but it doesn't
>>>>>>  >  change the behavior in my example.
>>>>>>
>>>>>> I tried it again and discovered that due to an argument parsing bug in
>>>>>> cabal-dev that the flag was not passed correctly. I explicitly passed it
>>>>>> and verified that it works. Thanks for the workaround. By the way, I did
>>>>>> look at the user guide for options like this and didn't see it. Which
>>>>>> part of the manual is it in?
>>>>>>
>>>>>> Can I still make a feature request for a function to make code run on
>>>>>> the original thread? My reasoning is that the code which needs to run on
>>>>>> the main thread may appear in a library in which case the developer has
>>>>>> no control over how ghc is invoked.
>>>>>
>>>>> I'm not sure how that would work.  The programmer is in control of what
>>>>> the
>>>>> main thread does, not GHC.  So in order to implement some mechanism to
>>>>> run
>>>>> code on the main thread, we would need some cooperation from the main
>>>>> thread
>>>>> itself.  For example, in gtk2hs the main thread runs an event handler
>>>>> loop
>>>>> which occasionally checks a queue for requests from other threads (at
>>>>> least,
>>>>> I think that's how it works).
>>>>
>>>> What I'm wrestling with is the following.  Say I make a GUI library.
>>>> As author of the GUI library I discover issues like this where the
>>>> library code needs to execute on the "main" thread.  Users of the
>>>> library expect the typical Haskell environment where you can't tell
>>>> the difference between threads, and you fork at will.  How can I make
>>>> sure my library works from GHC (with arbitrary user threads) and from
>>>> GHCI?
>>>>
>>>> As John Lato points out in his email lots of people bump into this
>>>> without realizing it and don't understand what the problem is.  We can
>>>> try our best to educate everyone, but I have this sense that we could
>>>> also do a better job of providing primitives to make it so that code
>>>> will run on the main thread regardless of how people invoke the
>>>> library.
>>>>
>>>> In my specific case (Cocoa on OSX), it is possible for me to use some
>>>> Cocoa functions to force things to run on the main thread.  From what
>>>> I've read Cocoa uses pthreads to implement this. I was hoping we could
>>>> expose something from the RTS code in Control.Concurrent so that it's
>>>> part of an "official" Haskell API that library writers can assume.
>>>>
>>>> Judging by this SO question, it's easier to implement this in Haskell
>>>> on top of pthreads than to implement it in C (here I'm assuming GHC's
>>>> RTS uses pthreads, but I've never checked):
>>>>
>>>> http://stackoverflow.com/questions/6130823/pthreads-perform-function-on-main-thread
>>>>
>>>> In fact, the it sounds like what Gtk2hs is doing with the postGUI
>>>> functions.
>>>
>>> Right, but usually the way this is implemented is with some cooperation from
>>> the main thread.  That SO answer explains it - the main thread runs some
>>> kind of loop that periodically checks for requests from other threads and
>>> services them.  I expect that's how it works on Cocoa.
>>> So you can't just do this from a library - the main thread has to be in on
>>> the game.
>>
>> Yes.  From my perspective (that of a library writer) that's what makes
>> this tricky in GHCi.  I need GHCi's cooperation.  From GHCi's
>> perspective it's tricky too.
>>
>>> I suppose you might wonder whether the GHC RTS could implement
>>> runInMainThread by preempting the main thread and running some different
>>> code on it.
>>
>> Yes, that's roughly what I was wondering about.
>
>
> There's more than one reason why a (GUI) library might require
> functions to be called only from the main thread. One is if the
> library uses thread-local storage, in which case the code needs to run
> in the right thread to see the right data. I've heard that OpenGL is
> like this. Another (more common, as far as I know) reason is if (parts
> of) the library aren't thread safe, and can't handle more than one
> thread at a time simultaneously calling its functions and mutating its
> members. I'm not sure if there are other reasons.
>
> In the second (thread safety) case, if you preempt the main thread in
> the middle of whatever it was doing to use it to call some function
> from the library, the effect would, I think, be the same as if the OS
> had preempted it to execute some other thread which then called the
> function, and you would be violating the library's
> one-thread-at-a-time expectation in pretty much the same exact way. So
> I don't think you would gain anything useful by doing this. The main
> thread needs to be interrupted at 'safe points', which is what the
> event loop lets you do, but the event loop is part of the GUI library,
> and not part of the GHC runtime, so GHC doesn't know about it and
> can't tell it what to do - only the library bindings can.
>
> Stated another way: I suspect most GUI libraries don't really actually
> care that you only execute GUI code from the main OS thread, as much
> as they care that only one (thread-unsafe) GUI function is being
> called at any given time. If you only ever call GUI code from the same
> (main) OS thread, that fulfills this requirement, because an OS thread
> is only capable of running one library function at a time;
> alternately, if you only ever call GUI code from the same Haskell
> thread, that also fulfills this requirement, because one Haskell
> thread is also only capable of running one library function at a time,
> even if its execution might jump between different OS threads along
> the way. (If you were writing code in the library's native language,
> and as part of your own code for processing an event in the main
> thread, stopped the main thread, used a different thread to execute
> some GUI functions, and then returned control to the main thread, I
> suspect that would also be safe, though there tends not to be any
> reason to want to do this.)
>
> Basically: In the context of GHC/Haskell, I think you need to separate
> the concept of "thread of execution", which is what the GUI libraries
> care about, from the concept of "OS threads", which nearly all of the
> time correspond to the threads of execution, but in this case, don't.
> (Or rather, do, but in a very different way from the usual.)

Clarifying: The OS threads and the Haskell threads both correspond to
threads of execution, and the two sets overlap with each other in time
in complicated ways, but the property "only runs one function at a
time, and runs it to completion before running a different function"
in this case belongs to the Haskell threads. ("Function" here used in
the C sense.) I'm not sure, at the moment, whether it *also* applies
to the OS threads. Thinking about this makes my brain hurt.

>
> These are impressions I've gained from the reading the docs (such as
> the paper Simon just linked) and thinking about it. If anyone more
> knowledgeable sees that I'm mistaken, please correct me.
>
>
>>
>>>  In theory that's possible, but whether it's a good idea or not
>>> is a different matter!  I think it amounts to the same thing as the gtk2hs
>>> folks have been asking for - multiple Haskell threads bound to the same OS
>>> thread.
>>
>> I'm starting to realize that I don't understand the GHC threading
>> model very well :)  I thought that was already possible.  I may be
>> mixing GHC's thread model up with other language implementations, but
>> I thought that it had a pool of OS threads and that Haskell threads
>> ran on them as needed.  I think what you're saying is that the RTS has
>> bound threads and it has thread pooling, but what it doesn't have is
>> "bound thread pooling" (that is, the combination of being bound and
>> pooled).
>>
>>>  runInMainThread then becomes the same as forking a temporary new
>>> thread bound to the main OS thread, or temporarily binding the current
>>> thread to the main OS thread.  If the main OS thread is off making a foreign
>>> call (e.g. in the GUI library's main loop) then it can't run any other
>>> Haskell threads anyway, and then I have to figure out what to do with all
>>> these Haskell threads waiting for their bound OS thread to come back from
>>> the foreign call.  My guess is that all this would be pretty complex to
>>> implement.
>>
>> Yes it does sound complex.  I'd really like help as much as possible.
>> I know very little about GHC internals but perhaps I could take a look
>> at some of the RTS code.  Is there some background reading I could do?
>>  Perhaps a specific reference to a paper or wiki page?
>>
>>> Still, I'm all for making things easier somehow.  At the least, we should
>>> have good diagnostics when you're using GHCi and this goes wrong.  Although
>>> I'm not sure how to do that, I think it's really something the gtk2hs or
>>> Cocoa binding needs to implement.  Do you have a way to check whether you're
>>> on the main thread or not?
>>
>> pthread_main_np is the only way I've stumbled across:
>> https://www.mirbsd.org/htman/i386/man3/pthread_main_np.htm
>>
>> Jason
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
>
>
> --
> Work is punishment for failing to procrastinate effectively.
>

-- 
Work is punishment for failing to procrastinate effectively.