[Haskell-cafe] forkProcess, forkIO, and multithreaded runtime

Mon Jan 21 10:14:28 CET 2013

I think you can test this theory with this patch.  If a thread is waiting
on the task->cond condition variable which is matched up with task->lock,
then pthread_cond_destroy will return EBUSY, which must always be a bug in
the RTS.

Alexander

diff --git a/rts/posix/OSThreads.c b/rts/posix/OSThreads.c
index ae31966..0f12830 100644
--- a/rts/posix/OSThreads.c
+++ b/rts/posix/OSThreads.c
@@ -91,7 +91,8 @@ initCondition( Condition* pCond )
 void
 closeCondition( Condition* pCond )
 {
-  pthread_cond_destroy(pCond);
+  int ret = pthread_cond_destroy(pCond);
+  CHECKM(ret == 0, "RTS BUG! Someone is waiting on condvar %d.", ret);
   return;
 }



On Mon, Jan 21, 2013 at 8:18 AM, Alexander Kjeldaas <
alexander.kjeldaas at gmail.com> wrote:

>
> I just looked at this code and since I don't know the code I can't give
> you good solutions, but for others watching this thread the links might
> prove interesting.
>
> My main theory is that you do have some other thread in FFI-land while you
> are fork()ing.  The task->cond, task->lock seems to be related to this (see
> quoted comments below).
>
> Also, pthread_mutex_destroy is undefined if the lock is locked, so I am
> guessing that the task->lock is somehow locked when it shouldn't be.
>
> It isn't clear from your description whether this is consistently
> happening on Linux, or whether this only sometimes happens.
>
> The forkProcess() code seems to hold all capabilities during fork, but
> that does not include FFI-land threads AFAIU.
>
> Assuming that this happens only rarely, I am trying to understand what
> happens if the thread that is in FFI-land returns to the RTS (in the
> parent) after fork(), but before the freeTask() in the child.  Based on the
> descriptions I read, it seems likely that this thread will try to inspect
> task->cap, which requires holding task->lock.
>
> That would in turn make the pthread_mutex_destroy in the child invalid.
>
> https://github.com/ghc/ghc/blob/master/rts/Task.h#L57
>
> """
>  ...
>  When a task is migrated from sleeping on one Capability to another,
>    its task->cap field must be modified.  When the task wakes up, it
>    will read the new value of task->cap to find out which Capability
>    it belongs to.  Hence some synchronisation is required on
>    task->cap, and this is why we have task->lock.
>
>    If the Task is not currently owned by task->id, then the thread is
>    either
>
>      (a) waiting on the condition task->cond.  The Task is either
>          (1) a bound Task, the TSO will be on a queue somewhere
>  (2) a worker task, on the spare_workers queue of task->cap.
>    ...
> """
>
> freeTask:
> https://github.com/ghc/ghc/blob/master/rts/Task.c#L142
>
> the comment in freeTask refers to this test:
>
> https://github.com/ghc/testsuite/blob/master/tests/concurrent/should_run/conc059.hs
>
> That test calls the RTC from C which then forkIOs off actions that are
> outstanding when the RTS exits.
>
> in forkProcess, child code
> https://github.com/ghc/ghc/blob/master/rts/Schedule.c#L1837
>
> It look like all this code supports the notion that some other thread can
> be in foreign code during the fork call.
>
> discardTasksExcept
> https://github.com/ghc/ghc/blob/master/rts/Task.c#L305
>
>
> Alexander
>
>
> On Mon, Jan 21, 2013 at 12:15 AM, Mark Lentczner <mark.lentczner at gmail.com
> > wrote:
>
>> Sorry to be reviving this thread so long after.... but I seem to be
>> running into similar issues as Michael S. did at the start.
>>
>> In short, I'm using forkProcess with the threaded RTS, and see occasional
>> hangs:
>>
>>    - I see these only on Linux. On Mac OS X, I never do.
>>    - I'm using GHC 7.4.2
>>    - I noticed the warning in the doc for forkProcess, but assumed I was
>>    safe, as I wasn't holding any shared resources at the time of the fork, and
>>    no shared resources in the program are used in the child.
>>    - WIth gdb, I've traced the hang to here in the run-time: forkProcess
>>    > discardTasksExcept > freeTask > closeMutex(&task->lock)
>>    > pthread_mutex_destroy
>>
>> The discussion in this thread leaves me with these questions:
>>
>>    - Is there reason to think the situation has gotten better in 7.6 and
>>    later?
>>    - Isn't the only reason *System.Process* is safer because it does an
>>    immediate exec in the child? Alas, I really want to just fork()sometimes.
>>    - Is it really true that even if my program has no shared resources
>>    with the child, that the IO subsystem and FFI system do anyway? Surely the
>>    RTS would take care of doing the right thing with those, no?
>>    - There should be no concern with exec w.r.t. library invariants
>>    since exec is wholesale replacement - all the libraries will
>>    reinitialize. Is there a problem here I'm missing?
>>
>> Alas, I've stopped using the threaded RTS until I understand this better.
>>
>> - Mark
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130121/2dec66f5/attachment-0001.htm>