[Haskell-cafe] forkProcess, forkIO, and multithreaded runtime
Alexander Kjeldaas
alexander.kjeldaas at gmail.com
Mon Jan 21 10:14:28 CET 2013
I think you can test this theory with this patch. If a thread is waiting
on the task->cond condition variable which is matched up with task->lock,
then pthread_cond_destroy will return EBUSY, which must always be a bug in
the RTS.
Alexander
diff --git a/rts/posix/OSThreads.c b/rts/posix/OSThreads.c
index ae31966..0f12830 100644
--- a/rts/posix/OSThreads.c
+++ b/rts/posix/OSThreads.c
@@ -91,7 +91,8 @@ initCondition( Condition* pCond )
void
closeCondition( Condition* pCond )
{
- pthread_cond_destroy(pCond);
+ int ret = pthread_cond_destroy(pCond);
+ CHECKM(ret == 0, "RTS BUG! Someone is waiting on condvar %d.", ret);
return;
}
On Mon, Jan 21, 2013 at 8:18 AM, Alexander Kjeldaas <
alexander.kjeldaas at gmail.com> wrote:
>
> I just looked at this code and since I don't know the code I can't give
> you good solutions, but for others watching this thread the links might
> prove interesting.
>
> My main theory is that you do have some other thread in FFI-land while you
> are fork()ing. The task->cond, task->lock seems to be related to this (see
> quoted comments below).
>
> Also, pthread_mutex_destroy is undefined if the lock is locked, so I am
> guessing that the task->lock is somehow locked when it shouldn't be.
>
> It isn't clear from your description whether this is consistently
> happening on Linux, or whether this only sometimes happens.
>
> The forkProcess() code seems to hold all capabilities during fork, but
> that does not include FFI-land threads AFAIU.
>
> Assuming that this happens only rarely, I am trying to understand what
> happens if the thread that is in FFI-land returns to the RTS (in the
> parent) after fork(), but before the freeTask() in the child. Based on the
> descriptions I read, it seems likely that this thread will try to inspect
> task->cap, which requires holding task->lock.
>
> That would in turn make the pthread_mutex_destroy in the child invalid.
>
> https://github.com/ghc/ghc/blob/master/rts/Task.h#L57
>
> """
> ...
> When a task is migrated from sleeping on one Capability to another,
> its task->cap field must be modified. When the task wakes up, it
> will read the new value of task->cap to find out which Capability
> it belongs to. Hence some synchronisation is required on
> task->cap, and this is why we have task->lock.
>
> If the Task is not currently owned by task->id, then the thread is
> either
>
> (a) waiting on the condition task->cond. The Task is either
> (1) a bound Task, the TSO will be on a queue somewhere
> (2) a worker task, on the spare_workers queue of task->cap.
> ...
> """
>
> freeTask:
> https://github.com/ghc/ghc/blob/master/rts/Task.c#L142
>
> the comment in freeTask refers to this test:
>
> https://github.com/ghc/testsuite/blob/master/tests/concurrent/should_run/conc059.hs
>
> That test calls the RTC from C which then forkIOs off actions that are
> outstanding when the RTS exits.
>
> in forkProcess, child code
> https://github.com/ghc/ghc/blob/master/rts/Schedule.c#L1837
>
> It look like all this code supports the notion that some other thread can
> be in foreign code during the fork call.
>
> discardTasksExcept
> https://github.com/ghc/ghc/blob/master/rts/Task.c#L305
>
>
> Alexander
>
>
> On Mon, Jan 21, 2013 at 12:15 AM, Mark Lentczner <mark.lentczner at gmail.com
> > wrote:
>
>> Sorry to be reviving this thread so long after.... but I seem to be
>> running into similar issues as Michael S. did at the start.
>>
>> In short, I'm using forkProcess with the threaded RTS, and see occasional
>> hangs:
>>
>> - I see these only on Linux. On Mac OS X, I never do.
>> - I'm using GHC 7.4.2
>> - I noticed the warning in the doc for forkProcess, but assumed I was
>> safe, as I wasn't holding any shared resources at the time of the fork, and
>> no shared resources in the program are used in the child.
>> - WIth gdb, I've traced the hang to here in the run-time: forkProcess
>> > discardTasksExcept > freeTask > closeMutex(&task->lock)
>> > pthread_mutex_destroy
>>
>> The discussion in this thread leaves me with these questions:
>>
>> - Is there reason to think the situation has gotten better in 7.6 and
>> later?
>> - Isn't the only reason *System.Process* is safer because it does an
>> immediate exec in the child? Alas, I really want to just fork()sometimes.
>> - Is it really true that even if my program has no shared resources
>> with the child, that the IO subsystem and FFI system do anyway? Surely the
>> RTS would take care of doing the right thing with those, no?
>> - There should be no concern with exec w.r.t. library invariants
>> since exec is wholesale replacement - all the libraries will
>> reinitialize. Is there a problem here I'm missing?
>>
>> Alas, I've stopped using the threaded RTS until I understand this better.
>>
>> - Mark
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130121/2dec66f5/attachment-0001.htm>
More information about the Haskell-Cafe
mailing list