GHC 7.8.3 thread hang

John Lato jwlato at gmail.com
Wed Nov 12 03:37:47 UTC 2014


The blocked on black hole message is very suspicious.  It means that thread
7 is blocked waiting for another thread to evaluate a thunk.  But in this
case, it's thread 7 that created that thunk and is supposed to be doing the
evaluating.  This is some evidence that Gregory's theory is correct and
your encode function loops somewhere.

On Wed Nov 12 2014 at 11:25:30 AM Michael Jones <mike at proclivis.com> wrote:

> Gregory,
>
> The options in the 7.8.3 user guide says in the -Msize option that by
> default the heap is unlimited. I have several applications, and they all
> have messages like:
>
> 7fddc7bcd700: cap 2: waking up thread 7 on cap 2
> 7fddc7bcd700: cap 2: thread 4 stopped (yielding)
> 7fddcaad6740: cap 2: running thread 7 (ThreadRunGHC)
> 7fddcaad6740: cap 2: thread 7 stopped (heap overflow)
> 7fddcaad6740: cap 2: requesting parallel GC
> 7fddc5ffe700: cap 0: starting GC
> 7fddc57fd700: cap 1: starting GC
> 7fdda77fe700: cap 3: starting GC
> 7fddcaad6740: cap 2: starting GC
>
> I assumed that when the heap ran out of space, it caused a GC, or it
> enlarged the heap. The programs that have these messages run for very long
> periods of time, and when I heap profile them, they use about 500KM to 1MB
> over long periods of time, and are quite stable.
>
> As a test, I ran the hang application with profiling to see if memory
> jumps up before or after the hang.
>
> What I notice is the app moves along using about 800KB, then there is a
> spike to 2MB at the hang. So I believe you, but I am confused about the RTS
> behavior and how I can have all these overflow messages in a normal
> application and how to tell the difference between these routine messages
> vs a real heap problem.
>
> So, I dug deeper into the log. A normal execution for sending a command
> looks like:
>
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
> 7f99e6ffd700: cap 0: requesting parallel GC
> 7f99e6ffd700: cap 0: starting GC
> 7f99e6ffd700: cap 0: GC working
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: all caps stopped for GC
> 7f99e6ffd700: cap 0: finished GC
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: sendCommand
> 7f99e6ffd700: cap 0: sendCommand: encoded
> 7f99e6ffd700: cap 0: sendCommand: size 4
> 7f99e6ffd700: cap 0: sendCommand: unpacked
> 7f99e6ffd700: cap 0: Sending command of size 4
> 7f99e6ffd700: cap 0: Sending command of size "\NUL\EOT"
> 7f99e6ffd700: cap 0: sendCommand: sent
> 7f99e6ffd700: cap 0: sendCommand: flushed
> 7f99e6ffd700: cap 0: thread 7 stopped (blocked on an MVar)
> 7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 2 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
>
> The thread is run, overflows, GC, runs, then blocks on an MVAr.
>
> For a the hang case:
>
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: sendCommand
> 7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
> 7f99e6ffd700: cap 0: requesting parallel GC
> 7f99e6ffd700: cap 0: starting GC
> 7f99e6ffd700: cap 0: GC working
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: all caps stopped for GC
> 7f99e6ffd700: cap 0: finished GC
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
> 7f99e6ffd700: cap 0: requesting parallel GC
> 7f99e6ffd700: cap 0: starting GC
> 7f99e6ffd700: cap 0: GC working
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: all caps stopped for GC
> 7f99e6ffd700: cap 0: finished GC
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 2 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> ...
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> 7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> 7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: thread 45 stopped (yielding)
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign
> call)
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign
> call)
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (blocked on black hole owned by
> thread 7)
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign
> call)
>
> run, overflow, GC, run, overflow, GC, run, yield, other stuff, run blocked
> on black hole
>
> And that is the last activity for thread 7.
>
> I found a few links about black holes and such, so I’ll go off and read
> those and try to learn what the RTS is doing and why it can hang on a black
> hole. But if you have some hits, let me know.
>
> Mike
>
>
>
> On Nov 11, 2014, at 4:01 PM, Gregory Collins <greg at gregorycollins.net>
> wrote:
>
>
> On Tue, Nov 11, 2014 at 2:06 PM, Michael Jones <mike at proclivis.com> wrote:
>
>> Those are all over the log even when it runs properly. So I assume the
>> runtime is resizing the heap or something.
>
>
> No, it means you're exhausting the heap (maybe the runtime stack for the
> thread running "encode"), probably because "encode" is infinite-looping. I
> think Occam's razor applies here, check that any recursion you're doing is
> actually reducing the recursive argument. Perhaps you could post the code
> (e.g. http://gist.github.com/)?
>
> G
>
>
> --
> Gregory Collins <greg at gregorycollins.net>
>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20141112/56d49951/attachment.html>


More information about the Glasgow-haskell-users mailing list