GHC 7.8.3 thread hang

Michael Jones mike at proclivis.com
Thu Nov 13 06:01:42 UTC 2014


I am having difficulty imagining how there could be a loop that evaluates a black hole a second time by the same thread. The paper Haskell on a Shared-Memory Processor mentions that the runtime must not do this. My take on this description is that it should never happen. Even if code was recursive, it should not happen. Code might recurse forever and chew up memory until failure, but a thread creating a black hold should not trip up on its own black hole.

So I am not sure what I am looking for. There is no recursion in the encode.

Can you give some examples of what I should be looking for?

Mike


On Nov 11, 2014, at 8:37 PM, John Lato <jwlato at gmail.com> wrote:

> The blocked on black hole message is very suspicious.  It means that thread 7 is blocked waiting for another thread to evaluate a thunk.  But in this case, it's thread 7 that created that thunk and is supposed to be doing the evaluating.  This is some evidence that Gregory's theory is correct and your encode function loops somewhere.
> 
> On Wed Nov 12 2014 at 11:25:30 AM Michael Jones <mike at proclivis.com> wrote:
> Gregory,
> 
> The options in the 7.8.3 user guide says in the -Msize option that by default the heap is unlimited. I have several applications, and they all have messages like:
> 
> 7fddc7bcd700: cap 2: waking up thread 7 on cap 2
> 7fddc7bcd700: cap 2: thread 4 stopped (yielding)
> 7fddcaad6740: cap 2: running thread 7 (ThreadRunGHC)
> 7fddcaad6740: cap 2: thread 7 stopped (heap overflow)
> 7fddcaad6740: cap 2: requesting parallel GC
> 7fddc5ffe700: cap 0: starting GC
> 7fddc57fd700: cap 1: starting GC
> 7fdda77fe700: cap 3: starting GC
> 7fddcaad6740: cap 2: starting GC
> 
> I assumed that when the heap ran out of space, it caused a GC, or it enlarged the heap. The programs that have these messages run for very long periods of time, and when I heap profile them, they use about 500KM to 1MB over long periods of time, and are quite stable.
> 
> As a test, I ran the hang application with profiling to see if memory jumps up before or after the hang.
> 
> What I notice is the app moves along using about 800KB, then there is a spike to 2MB at the hang. So I believe you, but I am confused about the RTS behavior and how I can have all these overflow messages in a normal application and how to tell the difference between these routine messages vs a real heap problem.
> 
> So, I dug deeper into the log. A normal execution for sending a command looks like:
> 
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
> 7f99e6ffd700: cap 0: requesting parallel GC
> 7f99e6ffd700: cap 0: starting GC
> 7f99e6ffd700: cap 0: GC working
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: all caps stopped for GC
> 7f99e6ffd700: cap 0: finished GC
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: sendCommand
> 7f99e6ffd700: cap 0: sendCommand: encoded
> 7f99e6ffd700: cap 0: sendCommand: size 4
> 7f99e6ffd700: cap 0: sendCommand: unpacked
> 7f99e6ffd700: cap 0: Sending command of size 4
> 7f99e6ffd700: cap 0: Sending command of size "\NUL\EOT"
> 7f99e6ffd700: cap 0: sendCommand: sent
> 7f99e6ffd700: cap 0: sendCommand: flushed
> 7f99e6ffd700: cap 0: thread 7 stopped (blocked on an MVar)
> 7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 2 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> 
> The thread is run, overflows, GC, runs, then blocks on an MVAr.
> 
> For a the hang case:
> 
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: sendCommand
> 7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
> 7f99e6ffd700: cap 0: requesting parallel GC
> 7f99e6ffd700: cap 0: starting GC
> 7f99e6ffd700: cap 0: GC working
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: all caps stopped for GC
> 7f99e6ffd700: cap 0: finished GC
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
> 7f99e6ffd700: cap 0: requesting parallel GC
> 7f99e6ffd700: cap 0: starting GC
> 7f99e6ffd700: cap 0: GC working
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: all caps stopped for GC
> 7f99e6ffd700: cap 0: finished GC
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 2 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> ...
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> 7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> 7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: thread 45 stopped (yielding)
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (blocked on black hole owned by thread 7)
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign call)
> 
> run, overflow, GC, run, overflow, GC, run, yield, other stuff, run blocked on black hole
> 
> And that is the last activity for thread 7.
> 
> I found a few links about black holes and such, so I’ll go off and read those and try to learn what the RTS is doing and why it can hang on a black hole. But if you have some hits, let me know. 
> 
> Mike
> 
> 
> 
> On Nov 11, 2014, at 4:01 PM, Gregory Collins <greg at gregorycollins.net> wrote:
> 
>> 
>> On Tue, Nov 11, 2014 at 2:06 PM, Michael Jones <mike at proclivis.com> wrote:
>> Those are all over the log even when it runs properly. So I assume the runtime is resizing the heap or something.
>> 
>> No, it means you're exhausting the heap (maybe the runtime stack for the thread running "encode"), probably because "encode" is infinite-looping. I think Occam's razor applies here, check that any recursion you're doing is actually reducing the recursive argument. Perhaps you could post the code (e.g. http://gist.github.com/)?
>> 
>> G
>> 
>> 
>> -- 
>> Gregory Collins <greg at gregorycollins.net>
> 
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20141112/85a28b3b/attachment.html>


More information about the Glasgow-haskell-users mailing list