Increasing number of worker tasks in RTS (GHC 7.4.1) - how to debug?

Tue Feb 28 11:33:11 CET 2012

On 26/02/2012 02:23, Sanket Agrawal wrote:
> I have to take back what I said about the increase in worker tasks being
> related to some Mac OS pthread bug. I can now reproduce the issue on
> Linux (Redhat x86_64) too (and cause a segmentation fault once in a
> while). So, now, it seems the issue might be due to either some kind of
> interaction between GHC RTS, and C pthread mutexes, or a bug in my code.
>
> What I have done is to create a simple test case that reproduces the
> increase in number of worker threads with each run of Haskell timer
> thread (that syncs with C pthreads). I have put up the code on github
> with documentation on how to reproduce the issue:
> https://github.com/sanketr/cffitest
>
> I will appreciate feedback on whether it is a bug in my code, or a GHC
> bug that needs to be reported.

What version of GHC is this?  I vaguely remember fixing something like this.

The rule of thumb is: if you think it is a bug then report it, and we'll 
investigate further.

Cheers,
	Simon

>
> On Sat, Feb 25, 2012 at 3:41 PM, Sanket Agrawal
> <sanket.agrawal at gmail.com <mailto:sanket.agrawal at gmail.com>> wrote:
>
>     On further investigation, it seems to be very specific to Mac OS
>     Lion (I am running 10.7.3) - all tests were with -N3 option:
>
>     - I can reliably crash the code with seg fault or bus error if I
>     create more than 8 threads in C FFI (each thread creates its own
>     mutex, for 1-1 coordination with Haskell timer thread). My iMac has
>     4 processors. In gdb, I can see that the crash happened
>     in __psynch_cvsignal () which seems to be related to pthread mutex.
>
>     - If I increase the number of C FFI threads (and hence, pthread
>     mutexes) to >=7, the number of tasks starts increasing. 8 is the max
>     number of FFI threads in my testing where the code runs without
>     crashing. But, it seems that there is some kind of pthread mutex
>     related leak. What the timer thread does is to fork 8 parallel
>     haskell threads to acquire mutexes from each of the C FFI thread.
>     Though the function returns after acquiring, collecting data, and
>     releasing mutex, some of the threads seem to be marked as active by
>     GC, because of mutex memory leak. Exactly how, I don't know.
>
>     - If I keep the number of C FFI threads to <=6, there is no memory
>     leak. The number of tasks stays steady.
>
>     So, it seems to be pthread library issue (and not a GHC issue).
>     Something to keep in mind when developing code on Mac that involves
>     mutex coordination with C FFI.
>
>
>     On Sat, Feb 25, 2012 at 2:59 PM, Sanket Agrawal
>     <sanket.agrawal at gmail.com <mailto:sanket.agrawal at gmail.com>> wrote:
>
>         I wrote a program that uses a timed thread to collect data from
>         a C producer (using FFI). The number of threads in C producer
>         are fixed (and created at init). One haskell timer thread uses
>         threadDelay to run itself on timed interval. When I look at RTS
>         output after killing the program after couple of timer
>         iterations, I see number of worker tasks increasing with time.
>
>           For example, below is an output after 20 iterations of timer
>         event:
>
>                                MUT time (elapsed)       GC time  (elapsed)
>            Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
>            Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
>            .......output until task 37 snipped as it is same as task
>         1.......
>            Task 38 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
>            Task 39 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
>            Task 40 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
>            Task 41 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
>            Task 42 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
>            Task 43 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
>            Task 44 (worker) :    0.52s    ( 10.74s)       0.00s    (  0.00s)
>            Task 45 (worker) :    0.52s    ( 10.75s)       0.00s    (  0.00s)
>            Task 46 (worker) :    0.52s    ( 10.75s)       0.00s    (  0.00s)
>            Task 47 (bound)  :    0.00s    (  0.00s)       0.00s    (  0.00s)
>
>
>         After two iterations of timer event:
>
>                                 MUT time (elapsed)       GC time  (elapsed)
>            Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
>            Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
>            Task  2 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
>            Task  3 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
>            Task  4 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
>            Task  5 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
>            Task  6 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
>            Task  7 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
>            Task  8 (worker) :    0.48s    (  1.80s)       0.00s    (  0.00s)
>            Task  9 (worker) :    0.48s    (  1.81s)       0.00s    (  0.00s)
>            Task 10 (worker) :    0.48s    (  1.81s)       0.00s    (  0.00s)
>            Task 11 (bound)  :    0.00s    (  0.00s)       0.00s    (  0.00s)
>
>
>         Haskell code has one forkIO call to kick off C FFI - C FFI
>         creates 8 threads. Runtime options are "-N3 +RTS -s". timer
>         event is kicked off after forkIO. It is for the form (pseudo-code):
>
>         timerevent <other arguments> time = run where run = do
>         threadDelay time >> do some work >> run where <other variables
>         defined for run function>
>
>         I also wrote a simpler code using just timer event (fork one
>         timer event, and run another timer event after that), but didn't
>         see any tasks in RTS output.
>
>         I tried searching GHC page for documentation on RTS output, but
>         didn't find anything that could help me debug above issue. I
>         suspect that timer event is the root cause of increasing number
>         of tasks (with all but last 9 tasks idle -  I guess 8 tasks
>         belong to C FFI, and one task to timerevent thread), and hence,
>         memory leak.
>
>         I will appreciate pointers on how to debug it. The timerevent
>         does forkIO a call to send collected data from C FFI to a db
>         server, but disabling that fork still results in the issue of
>         increasing number of tasks. So, it seems strongly correlated
>         with timer event though I am unable to reproduce it with a
>         simpler version of timer event (which removes mvar sync/callback
>         from C FFI).
>
>
>
>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users