[Haskell-cafe] forkProcess, forkIO, and multithreaded runtime
Alexander Kjeldaas
alexander.kjeldaas at gmail.com
Mon Jan 21 10:42:37 CET 2013
Or this. It seems that you must compile with DEBUG for the mutex check.
This enables error-checking mutexes on posix.
Alexander
diff --git a/rts/posix/OSThreads.c b/rts/posix/OSThreads.c
index ae31966..e07221d 100644
--- a/rts/posix/OSThreads.c
+++ b/rts/posix/OSThreads.c
@@ -91,7 +91,8 @@ initCondition( Condition* pCond )
void
closeCondition( Condition* pCond )
{
- pthread_cond_destroy(pCond);
+ int ret = pthread_cond_destroy(pCond);
+ CHECKM(ret == 0, "RTS Bug! Someone is waiting on condvar ret=%d.", ret);
return;
}
@@ -165,7 +166,8 @@ initMutex(Mutex* pMut)
void
closeMutex(Mutex* pMut)
{
- pthread_mutex_destroy(pMut);
+ int ret = pthread_mutex_destroy(pMut);
+ CHECKM(ret == 0, "RTS Bug! Destroying held mutex ret=%d", ret);
}
void
On Mon, Jan 21, 2013 at 10:14 AM, Alexander Kjeldaas <
alexander.kjeldaas at gmail.com> wrote:
> I think you can test this theory with this patch. If a thread is waiting
> on the task->cond condition variable which is matched up with task->lock,
> then pthread_cond_destroy will return EBUSY, which must always be a bug in
> the RTS.
>
> Alexander
>
> diff --git a/rts/posix/OSThreads.c b/rts/posix/OSThreads.c
> index ae31966..0f12830 100644
> --- a/rts/posix/OSThreads.c
> +++ b/rts/posix/OSThreads.c
> @@ -91,7 +91,8 @@ initCondition( Condition* pCond )
> void
> closeCondition( Condition* pCond )
> {
> - pthread_cond_destroy(pCond);
> + int ret = pthread_cond_destroy(pCond);
> + CHECKM(ret == 0, "RTS BUG! Someone is waiting on condvar %d.", ret);
> return;
> }
>
>
>
> On Mon, Jan 21, 2013 at 8:18 AM, Alexander Kjeldaas <
> alexander.kjeldaas at gmail.com> wrote:
>
>>
>> I just looked at this code and since I don't know the code I can't give
>> you good solutions, but for others watching this thread the links might
>> prove interesting.
>>
>> My main theory is that you do have some other thread in FFI-land while
>> you are fork()ing. The task->cond, task->lock seems to be related to this
>> (see quoted comments below).
>>
>> Also, pthread_mutex_destroy is undefined if the lock is locked, so I am
>> guessing that the task->lock is somehow locked when it shouldn't be.
>>
>> It isn't clear from your description whether this is consistently
>> happening on Linux, or whether this only sometimes happens.
>>
>> The forkProcess() code seems to hold all capabilities during fork, but
>> that does not include FFI-land threads AFAIU.
>>
>> Assuming that this happens only rarely, I am trying to understand what
>> happens if the thread that is in FFI-land returns to the RTS (in the
>> parent) after fork(), but before the freeTask() in the child. Based on the
>> descriptions I read, it seems likely that this thread will try to inspect
>> task->cap, which requires holding task->lock.
>>
>> That would in turn make the pthread_mutex_destroy in the child invalid.
>>
>> https://github.com/ghc/ghc/blob/master/rts/Task.h#L57
>>
>> """
>> ...
>> When a task is migrated from sleeping on one Capability to another,
>> its task->cap field must be modified. When the task wakes up, it
>> will read the new value of task->cap to find out which Capability
>> it belongs to. Hence some synchronisation is required on
>> task->cap, and this is why we have task->lock.
>>
>> If the Task is not currently owned by task->id, then the thread is
>> either
>>
>> (a) waiting on the condition task->cond. The Task is either
>> (1) a bound Task, the TSO will be on a queue somewhere
>> (2) a worker task, on the spare_workers queue of task->cap.
>> ...
>> """
>>
>> freeTask:
>> https://github.com/ghc/ghc/blob/master/rts/Task.c#L142
>>
>> the comment in freeTask refers to this test:
>>
>> https://github.com/ghc/testsuite/blob/master/tests/concurrent/should_run/conc059.hs
>>
>> That test calls the RTC from C which then forkIOs off actions that are
>> outstanding when the RTS exits.
>>
>> in forkProcess, child code
>> https://github.com/ghc/ghc/blob/master/rts/Schedule.c#L1837
>>
>> It look like all this code supports the notion that some other thread can
>> be in foreign code during the fork call.
>>
>> discardTasksExcept
>> https://github.com/ghc/ghc/blob/master/rts/Task.c#L305
>>
>>
>> Alexander
>>
>>
>> On Mon, Jan 21, 2013 at 12:15 AM, Mark Lentczner <
>> mark.lentczner at gmail.com> wrote:
>>
>>> Sorry to be reviving this thread so long after.... but I seem to be
>>> running into similar issues as Michael S. did at the start.
>>>
>>> In short, I'm using forkProcess with the threaded RTS, and see
>>> occasional hangs:
>>>
>>> - I see these only on Linux. On Mac OS X, I never do.
>>> - I'm using GHC 7.4.2
>>> - I noticed the warning in the doc for forkProcess, but assumed I
>>> was safe, as I wasn't holding any shared resources at the time of the fork,
>>> and no shared resources in the program are used in the child.
>>> - WIth gdb, I've traced the hang to here in the run-time: forkProcess
>>> > discardTasksExcept > freeTask > closeMutex(&task->lock)
>>> > pthread_mutex_destroy
>>>
>>> The discussion in this thread leaves me with these questions:
>>>
>>> - Is there reason to think the situation has gotten better in 7.6
>>> and later?
>>> - Isn't the only reason *System.Process* is safer because it does an
>>> immediate exec in the child? Alas, I really want to just fork()sometimes.
>>> - Is it really true that even if my program has no shared resources
>>> with the child, that the IO subsystem and FFI system do anyway? Surely the
>>> RTS would take care of doing the right thing with those, no?
>>> - There should be no concern with exec w.r.t. library invariants
>>> since exec is wholesale replacement - all the libraries will
>>> reinitialize. Is there a problem here I'm missing?
>>>
>>> Alas, I've stopped using the threaded RTS until I understand this better.
>>>
>>> - Mark
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130121/1bde3f67/attachment.htm>
More information about the Haskell-Cafe
mailing list