GHC 7.8.3 thread hang

Michael Jones mike at proclivis.com
Tue Nov 11 17:38:53 UTC 2014


OS: Ubuntu 14.X TLS
GHC-Options:   -rtsopts -threaded -debug -eventlog

Behaves the same with

GHC-Options:   -rtsopts -threaded -O2

Mike

On Nov 11, 2014, at 8:01 AM, Carter Schonwald <carter.schonwald at gmail.com> wrote:

> what OS are you on?
> what build options did you use?
> 
> On Tue, Nov 11, 2014 at 2:11 AM, Michael Jones <mike at proclivis.com> wrote:
> I am trying to debug a lockup problem (and need help with debugging technique), where hang means a thread stops at a known place during evaluation, and other threads continue.
> 
> The code near the problem is like:
> 
>        ec <- return $ encode command
>        l <- return $ BSL.length ec
>        ss <- return $ BSC.unpack ec
> 
> It does not matter if I use let or return, or if the length is taken after unpack. I used return so I could use this code for tracing, with strictness to try to find the exact statement that is the problem:
> 
>        traceEventIO "sendCommand"
>        ec <- return $ encode command
>        traceEventIO $ "sendCommand: encoded"
>        l <- ec `seq` return $ BSL.length ec
>        traceEventIO $ "sendCommand: size " ++ (show l)
>        ss <- ec `seq` return $ BSC.unpack ec
> 
> When this runs, the program executes this many times, but always hangs under a certain condition.
> 
> For good evaluations:
> 
> 7f04173ff700: cap 0: sendCommand
> 7f04173ff700: cap 0: sendCommand: encoded
> 7f04173ff700: cap 0: sendCommand: size 4
> 7f04173ff700: cap 0: sendCommand: unpacked
> 7f04173ff700: cap 0: Sending command of size 4
> 7f04173ff700: cap 0: Sending command of size "\NUL\EOT"
> 7f04173ff700: cap 0: sendCommand: sent
> 7f04173ff700: cap 0: sendCommand: flushed
> 
> for bad evaluation:
> 
> 7f04173ff700: cap 0: sendCommand
> 7f04173ff700: cap 0: sendCommand: encoded
> 
> The lockup occurs when length is taken.
> 
> The difference between the working and non-working case is as follows:
> 
> A wxHaskell callback stuffs some data in a TChan. A thread started at application startup is reading the TChan and calling the code that hangs. If it did not hang, it would send it by TCP to another process/computer.
> 
> In the working case the callback pops a dialog, and passes data from one TChan to another TChan.
> 
> In the failing case, the data is used to generate strings in a wxHaskell grid, then it is parsed, and a new data is made. The new data is a combination of old and new pieces of sub data. The shape of the date is identical, because I am not making any edits to the rows.
> 
> So when data that the callback sends to TChan is unmodified, no hang. But when the data is used to make text, put it in the gui, process it, and generate new data, it hangs.
> 
> As a test I modified the code so that the text is not put into the gui. The results are the same. This indicates it has something to do with creating strings and then data from strings and mixing old and new subdata. Strings are created with show. Data is created by pattern matching and generating numbers from strings. I should also point out that in the working case, the size of the resulting string is small, say 3. In the hang case, the resulting string would be long, say 5000-10000.
> 
> I assume there are no limits to the size of ByteStrings or fundemental issues with the RTS stack/heap that require special settings.
> 
> I am using the following revisions:
> 
> GHC 7.8.3
>     base ==4.7.*,
>     mtl ==2.2.1,
>     containers == 0.5.5.1,
>     transformers ==0.4.1.0,
>     random == 1.0.1.1,
>     wx == 0.91.0.0,
>     wxcore == 0.91.0.0,
>     wxdirect == 0.91.0.0,
>     colour == 2.3.3,
>     stm == 2.4.2,
>     monad-loops == 0.4.2.1,
>     time == 1.4.2,
>     old-locale == 1.0.0.6,
>     fast-logger == 2.2.3,
>     network == 2.6.0.2,
>     bytestring == 0.10.4.0,
>     control-monad-loop == 0.1,
>     binary == 0.7.2.2,
> 
> I know that nobody can have an answer based on this. But what I am hoping is either there is some known bug, or someone can guide me in narrowing it down. The event log does not have anything unusual in it. Other threads keep running, and I can exit the application normally. The thread does not throw an exception. It just hangs.
> 
> When I run the app, I just use +RTS -v
> 
> Perhaps there are some other options that might give more info?
> 
> — SNIPPET of log —
> 
> 7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
> 7fe544cfea40: cap 0: thread 5 stopped (yielding)
> 7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
> 7fe544cfea40: cap 0: thread 5 stopped (suspended while making a foreign call)
> 7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
> 7fe544cfea40: cap 0: thread 5 stopped (blocked on an MVar)
> 7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
> 7fe537eff700: cap 0: waking up thread 5 on cap 0
> 7fe537eff700: cap 0: thread 2 stopped (yielding)
> 7fe544cfea40: cap 0: running thread 5 (ThreadRunGHC)
> 7fe544cfea40: cap 0: sendCommand
> 7fe544cfea40: cap 0: sendCommand: encoded
> 7fe544cfea40: cap 0: sendCommand: size 3                               WORKS HERE
> 7fe544cfea40: cap 0: sendCommand: unpacked
> 7fe544cfea40: cap 0: Sending command of size 3
> 7fe544cfea40: cap 0: Sending command of size "\NUL\ETX"
> 7fe544cfea40: cap 0: sendCommand: sent
> 7fe544cfea40: cap 0: sendCommand: flushed
> 7fe544cfea40: cap 0: thread 5 stopped (blocked on an MVar)
> 7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
> 7fe537eff700: cap 0: thread 2 stopped (yielding)
> 7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
> 7fe537eff700: cap 0: thread 2 stopped (suspended while making a foreign call)
> 7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
> 7fe537eff700: cap 0: waking up thread 5 on cap 0
> 
>> 
> 7fe537eff700: cap 0: fetchTelemetryServer: got lock
> 7fe537eff700: cap 0: thread 45 stopped (yielding)
> 7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
> 7fe537eff700: cap 0: thread 7 stopped (heap overflow)
> 7fe537eff700: cap 0: requesting parallel GC
> 7fe537eff700: cap 0: starting GC
> 7fe537eff700: cap 0: GC working
> 7fe537eff700: cap 0: GC idle
> 7fe537eff700: cap 0: GC done
> 7fe537eff700: cap 0: GC idle
> 7fe537eff700: cap 0: GC done
> 7fe537eff700: cap 0: all caps stopped for GC
> 7fe537eff700: cap 0: finished GC
> 7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
> 7fe537eff700: cap 0: sendCommand
> 7fe537eff700: cap 0: sendCommand: encoded                       PROBLEM HERE
> 7fe537eff700: cap 0: thread 7 stopped (heap overflow)
> 7fe537eff700: cap 0: requesting parallel GC
> 7fe537eff700: cap 0: starting GC
> 7fe537eff700: cap 0: GC working
> 7fe537eff700: cap 0: GC idle
> 7fe537eff700: cap 0: GC done
> 7fe537eff700: cap 0: GC idle
> 7fe537eff700: cap 0: GC done
> 7fe537eff700: cap 0: all caps stopped for GC
> 7fe537eff700: cap 0: finished GC
> 7fe537eff700: cap 0: running thread 7 (ThreadRunGHC)
> 7fe537eff700: cap 0: thread 7 stopped (yielding)
> 7fe537eff700: cap 0: running thread 2 (ThreadRunGHC)
> 7fe537eff700: cap 0: thread 2 stopped (yielding)
> 7fe544cfea40: cap 0: running thread 442 (ThreadRunGHC)
> 7fe544cfea40: cap 0: thread 442 stopped (suspended while making a foreign call)
> 7fe544cfea40: cap 0: running thread 442 (ThreadRunGHC)
> 7fe544cfea40: cap 0: thread 442 stopped (suspended while making a foreign call)
> 7fe537eff700: cap 0: running thread 45 (ThreadRunGHC)
> 7fe537eff700: cap 0: fetchTelemetryServer: unlock
> 7fe537eff700: cap 0: fetchTelemetryServer
> 7fe537eff700: cap 0: fetchTelemetryServer: got lock
> 7fe537eff700: cap 0: fetchTelemetryServer: pump seq
> 
> 
> 
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20141111/4d3f2634/attachment-0001.html>


More information about the Glasgow-haskell-users mailing list