GHC 7.8.3 thread hang

Michael Jones mike at proclivis.com
Tue Nov 11 07:11:13 UTC 2014


I am trying to debug a lockup problem (and need help with debugging technique), where hang means a thread stops at a known place during evaluation, and other threads continue.

The code near the problem is like:

       ec <- return $ encode command
       l <- return $ BSL.length ec
       ss <- return $ BSC.unpack ec

It does not matter if I use let or return, or if the length is taken after unpack. I used return so I could use this code for tracing, with strictness to try to find the exact statement that is the problem:

       traceEventIO "sendCommand"
       ec <- return $ encode command
       traceEventIO $ "sendCommand: encoded"
       l <- ec `seq` return $ BSL.length ec
       traceEventIO $ "sendCommand: size " ++ (show l)
       ss <- ec `seq` return $ BSC.unpack ec

When this runs, the program executes this many times, but always hangs under a certain condition.

For good evaluations:

7f04173ff700: cap 0: sendCommand
7f04173ff700: cap 0: sendCommand: encoded
7f04173ff700: cap 0: sendCommand: size 4
7f04173ff700: cap 0: sendCommand: unpacked
7f04173ff700: cap 0: Sending command of size 4
7f04173ff700: cap 0: Sending command of size "\NUL\EOT"
7f04173ff700: cap 0: sendCommand: sent
7f04173ff700: cap 0: sendCommand: flushed

for bad evaluation:

7f04173ff700: cap 0: sendCommand
7f04173ff700: cap 0: sendCommand: encoded

The lockup occurs when length is taken.

The difference between the working and non-working case is as follows:

A wxHaskell callback stuffs some data in a TChan. A thread started at application startup is reading the TChan and calling the code that hangs. If it did not hang, it would send it by TCP to another process/computer.

In the working case the callback pops a dialog, and passes data from one TChan to another TChan.

In the failing case, the data is used to generate strings in a wxHaskell grid, then it is parsed, and a new data is made. The new data is a combination of old and new pieces of sub data. The shape of the date is identical, because I am not making any edits to the rows.

So when data that the callback sends to TChan is unmodified, no hang. But when the data is used to make text, put it in the gui, process it, and generate new data, it hangs.

As a test I modified the code so that the text is not put into the gui. The results are the same. This indicates it has something to do with creating strings and then data from strings and mixing old and new subdata. Strings are created with show. Data is created by pattern matching and generating numbers from strings. I should also point out that in the working case, the size of the resulting string is small, say 3. In the hang case, the resulting string would be long, say 5000-10000.

I assume there are no limits to the size of ByteStrings or fundemental issues with the RTS stack/heap that require special settings.

I am using the following revisions:

GHC 7.8.3
    base ==4.7.*, 
    mtl ==2.2.1,
    containers == 0.5.5.1, 
    transformers ==0.4.1.0, 
    random == 1.0.1.1, 
    wx == 0.91.0.0, 
    wxcore == 0.91.0.0, 
    wxdirect == 0.91.0.0, 
    colour == 2.3.3, 
    stm == 2.4.2, 
    monad-loops == 0.4.2.1, 
    time == 1.4.2,
    old-locale == 1.0.0.6,
    fast-logger == 2.2.3,
    network == 2.6.0.2,
    bytestring == 0.10.4.0,
    control-monad-loop == 0.1,
    binary == 0.7.2.2,

I know that nobody can have an answer based on this. But what I am hoping is either there is some known bug, or someone can guide me in narrowing it down. The event log does not have anything unusual in it. Other threads keep running, and I can exit the application normally. The thread does not throw an exception. It just hangs.

When I run the app, I just use +RTS -v 

Perhaps there are some other options that might give more info?

— SNIPPET of log —

7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (yielding)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (suspended while making a foreign call)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 5 stopped (blocked on an MVar)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: waking up thread 5 on cap 0
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
7fe544cfea40: cap 0: sendCommand
7fe544cfea40: cap 0: sendCommand: encoded
7fe544cfea40: cap 0: sendCommand: size 3                               WORKS HERE
7fe544cfea40: cap 0: sendCommand: unpacked
7fe544cfea40: cap 0: Sending command of size 3
7fe544cfea40: cap 0: Sending command of size "\NUL\ETX"
7fe544cfea40: cap 0: sendCommand: sent
7fe544cfea40: cap 0: sendCommand: flushed
7fe544cfea40: cap 0: thread 5 stopped (blocked on an MVar)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (suspended while making a foreign call)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: waking up thread 5 on cap 0

…

7fe537eff700: cap 0: fetchTelemetryServer: got lock
7fe537eff700: cap 0: thread 45 stopped (yielding)
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: thread 7 stopped (heap overflow)
7fe537eff700: cap 0: requesting parallel GC
7fe537eff700: cap 0: starting GC
7fe537eff700: cap 0: GC working
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: all caps stopped for GC
7fe537eff700: cap 0: finished GC
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: sendCommand
7fe537eff700: cap 0: sendCommand: encoded                       PROBLEM HERE
7fe537eff700: cap 0: thread 7 stopped (heap overflow)
7fe537eff700: cap 0: requesting parallel GC
7fe537eff700: cap 0: starting GC
7fe537eff700: cap 0: GC working
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: GC idle
7fe537eff700: cap 0: GC done
7fe537eff700: cap 0: all caps stopped for GC
7fe537eff700: cap 0: finished GC
7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
7fe537eff700: cap 0: thread 7 stopped (yielding)
7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
7fe537eff700: cap 0: thread 2 stopped (yielding)
7fe544cfea40: cap 0: running thread 442 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 442 stopped (suspended while making a foreign call)
7fe544cfea40: cap 0: running thread 442 (ThreadRunGHC)
7fe544cfea40: cap 0: thread 442 stopped (suspended while making a foreign call)
7fe537eff700: cap 0: running thread 45 (ThreadRunGHC)
7fe537eff700: cap 0: fetchTelemetryServer: unlock
7fe537eff700: cap 0: fetchTelemetryServer
7fe537eff700: cap 0: fetchTelemetryServer: got lock
7fe537eff700: cap 0: fetchTelemetryServer: pump seq





More information about the Glasgow-haskell-users mailing list