GHC 7.8.3 thread hang

Michael Jones mike at proclivis.com
Wed Nov 12 03:24:57 UTC 2014


Gregory,

The options in the 7.8.3 user guide says in the -Msize option that by default the heap is unlimited. I have several applications, and they all have messages like:

7fddc7bcd700: cap 2: waking up thread 7 on cap 2
7fddc7bcd700: cap 2: thread 4 stopped (yielding)
7fddcaad6740: cap 2: running thread 7 (ThreadRunGHC)
7fddcaad6740: cap 2: thread 7 stopped (heap overflow)
7fddcaad6740: cap 2: requesting parallel GC
7fddc5ffe700: cap 0: starting GC
7fddc57fd700: cap 1: starting GC
7fdda77fe700: cap 3: starting GC
7fddcaad6740: cap 2: starting GC

I assumed that when the heap ran out of space, it caused a GC, or it enlarged the heap. The programs that have these messages run for very long periods of time, and when I heap profile them, they use about 500KM to 1MB over long periods of time, and are quite stable.

As a test, I ran the hang application with profiling to see if memory jumps up before or after the hang.

What I notice is the app moves along using about 800KB, then there is a spike to 2MB at the hang. So I believe you, but I am confused about the RTS behavior and how I can have all these overflow messages in a normal application and how to tell the difference between these routine messages vs a real heap problem.

So, I dug deeper into the log. A normal execution for sending a command looks like:

7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: sendCommand
7f99e6ffd700: cap 0: sendCommand: encoded
7f99e6ffd700: cap 0: sendCommand: size 4
7f99e6ffd700: cap 0: sendCommand: unpacked
7f99e6ffd700: cap 0: Sending command of size 4
7f99e6ffd700: cap 0: Sending command of size "\NUL\EOT"
7f99e6ffd700: cap 0: sendCommand: sent
7f99e6ffd700: cap 0: sendCommand: flushed
7f99e6ffd700: cap 0: thread 7 stopped (blocked on an MVar)
7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 2 stopped (yielding)
7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock

The thread is run, overflows, GC, runs, then blocks on an MVAr.

For a the hang case:

7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: sendCommand
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (yielding)
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
7f99e6ffd700: cap 0: requesting parallel GC
7f99e6ffd700: cap 0: starting GC
7f99e6ffd700: cap 0: GC working
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: GC idle
7f99e6ffd700: cap 0: GC done
7f99e6ffd700: cap 0: all caps stopped for GC
7f99e6ffd700: cap 0: finished GC
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (yielding)
7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 2 stopped (yielding)
7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
...
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
7f99e6ffd700: cap 0: fetchTelemetryServer
7f99e6ffd700: cap 0: thread 45 stopped (yielding)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
7f99e6ffd700: cap 0: thread 7 stopped (blocked on black hole owned by thread 7)
7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)

run, overflow, GC, run, overflow, GC, run, yield, other stuff, run blocked on black hole

And that is the last activity for thread 7.

I found a few links about black holes and such, so I’ll go off and read those and try to learn what the RTS is doing and why it can hang on a black hole. But if you have some hits, let me know. 

Mike



On Nov 11, 2014, at 4:01 PM, Gregory Collins <greg at gregorycollins.net> wrote:

> 
> On Tue, Nov 11, 2014 at 2:06 PM, Michael Jones <mike at proclivis.com> wrote:
> Those are all over the log even when it runs properly. So I assume the runtime is resizing the heap or something.
> 
> No, it means you're exhausting the heap (maybe the runtime stack for the thread running "encode"), probably because "encode" is infinite-looping. I think Occam's razor applies here, check that any recursion you're doing is actually reducing the recursive argument. Perhaps you could post the code (e.g. http://gist.github.com/)?
> 
> G
> 
> 
> -- 
> Gregory Collins <greg at gregorycollins.net>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20141111/89ec52a0/attachment-0001.html>


More information about the Glasgow-haskell-users mailing list