[Haskell-cafe] forkProcess, forkIO, and multithreaded runtime

Mon Jan 21 08:18:14 CET 2013

I just looked at this code and since I don't know the code I can't give you
good solutions, but for others watching this thread the links might prove
interesting.

My main theory is that you do have some other thread in FFI-land while you
are fork()ing.  The task->cond, task->lock seems to be related to this (see
quoted comments below).

Also, pthread_mutex_destroy is undefined if the lock is locked, so I am
guessing that the task->lock is somehow locked when it shouldn't be.

It isn't clear from your description whether this is consistently happening
on Linux, or whether this only sometimes happens.

The forkProcess() code seems to hold all capabilities during fork, but that
does not include FFI-land threads AFAIU.

Assuming that this happens only rarely, I am trying to understand what
happens if the thread that is in FFI-land returns to the RTS (in the
parent) after fork(), but before the freeTask() in the child.  Based on the
descriptions I read, it seems likely that this thread will try to inspect
task->cap, which requires holding task->lock.

That would in turn make the pthread_mutex_destroy in the child invalid.

https://github.com/ghc/ghc/blob/master/rts/Task.h#L57

"""
 ...
 When a task is migrated from sleeping on one Capability to another,
   its task->cap field must be modified.  When the task wakes up, it
   will read the new value of task->cap to find out which Capability
   it belongs to.  Hence some synchronisation is required on
   task->cap, and this is why we have task->lock.

   If the Task is not currently owned by task->id, then the thread is
   either

     (a) waiting on the condition task->cond.  The Task is either
         (1) a bound Task, the TSO will be on a queue somewhere
 (2) a worker task, on the spare_workers queue of task->cap.
   ...
"""

freeTask:
https://github.com/ghc/ghc/blob/master/rts/Task.c#L142

the comment in freeTask refers to this test:
https://github.com/ghc/testsuite/blob/master/tests/concurrent/should_run/conc059.hs

That test calls the RTC from C which then forkIOs off actions that are
outstanding when the RTS exits.

in forkProcess, child code
https://github.com/ghc/ghc/blob/master/rts/Schedule.c#L1837

It look like all this code supports the notion that some other thread can
be in foreign code during the fork call.

discardTasksExcept
https://github.com/ghc/ghc/blob/master/rts/Task.c#L305

Alexander

On Mon, Jan 21, 2013 at 12:15 AM, Mark Lentczner
<mark.lentczner at gmail.com>wrote:

> Sorry to be reviving this thread so long after.... but I seem to be
> running into similar issues as Michael S. did at the start.
>
> In short, I'm using forkProcess with the threaded RTS, and see occasional
> hangs:
>
>    - I see these only on Linux. On Mac OS X, I never do.
>    - I'm using GHC 7.4.2
>    - I noticed the warning in the doc for forkProcess, but assumed I was
>    safe, as I wasn't holding any shared resources at the time of the fork, and
>    no shared resources in the program are used in the child.
>    - WIth gdb, I've traced the hang to here in the run-time: forkProcess
>    > discardTasksExcept > freeTask > closeMutex(&task->lock)
>    > pthread_mutex_destroy
>
> The discussion in this thread leaves me with these questions:
>
>    - Is there reason to think the situation has gotten better in 7.6 and
>    later?
>    - Isn't the only reason *System.Process* is safer because it does an
>    immediate exec in the child? Alas, I really want to just fork()sometimes.
>    - Is it really true that even if my program has no shared resources
>    with the child, that the IO subsystem and FFI system do anyway? Surely the
>    RTS would take care of doing the right thing with those, no?
>    - There should be no concern with exec w.r.t. library invariants since
>    exec is wholesale replacement - all the libraries will reinitialize.
>    Is there a problem here I'm missing?
>
> Alas, I've stopped using the threaded RTS until I understand this better.
>
> - Mark
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130121/30b0005f/attachment.htm>