Thread behavior in 7.8.3

Michael Jones mike at proclivis.com
Sun Jan 18 01:15:09 UTC 2015


I have narrowed down the problem a bit. It turns out that many times if I run the program and wait long enough, it will start. Given an event log, it may take from 1000-10000 entries sometimes.

When I look at a good start vs. slow start, I see that in both cases things startup and there is some thread activity for thread 2 and 3, then the application starts creating other threads, which is when the wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, it just gets stuck on thread 2/3 activity for a very long time.

If I switch from -C0.001 to -C0.010, the startup is more reliable, in that most starts result in an immediate GUI and i2c IO.

The behavior suggests to me that some initial threads are starving the ability for other threads to start, and perhaps on a dual core machine it is more of a problem than single or quad core machines. For certain, due to some printing, I know that the main thread is starting, and that a print just before the first fork is not printing. Code between them is evaluating wxhaskell functions, but the main frame is not yet asked to become visible. From last week, I know that an non-gui version of the app is getting stuck, but I do not know if it eventually runs like this case.

Is there some convention that when I look at an event log you can tell which threads are OS threads vs threads from fork?

Perhaps someone that knows the scheduler might have some advice. It seems odd that a scheduler could behave this way. The scheduler should have some built in notion of fairness.


On Jan 12, 2015, at 11:02 PM, Michael Jones <mike at proclivis.com> wrote:

> Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another.
> 
> Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, something bad is going on. In summary, the same code that runs on two machines does not run on a third machine. So this indicates I have not made any breaking changes to the code or cabal files. Compiling with GHC 7.8.3.
> 
> This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 kernel. It is a dual core 64 bit I86 Atom processor. The application hangs at startup. If I remove the -C0.00N option and instead use -V0, the application runs. It has bad timing properties, but it does at least run. Note that a hang hangs an IO thread talking USB, and the GUI thread.
> 
> When testing with the -C0.00N option, it did run 2 times out of 20 tries, so fail means fail most but not all of the time. When it did run, it continued to run properly. This perhaps indicates some kind of internal race condition.
> 
> In the fail to run case, it does some printing up to the point where it tries to create a wxHaskell frame. In another non-UI version of the program it also fails to run. Logging to a file gives a similar indication. It is clear that the program starts up, then fails during the run in some form of lockup, well after the initial startup code.
> 
> If I run with the strace command, it always runs with -C0.00N.
> 
> All the above was done with profiling enabled, so I removed that and instead enabled eventlog to look for clues.
> 
> In this case it lies between good and bad, in that IO to my USB is working, but the GUI comes up blank and never paints. Running this case without -v0 (event log) the gui partially paints and stops, but USB continues.
> 
> Questions:
> 
> 1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems?
> 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have?
> 3) Any ideas how to track down the problem when changing conditions (compiler or runtime options) affects behavior?
> 4) Are there other options besides -V and -C for the runtime that might apply?
> 5) What does -V0 do that makes a problem program run?
> 
> Mike
> 
> 
> 
> 
> On Oct 29, 2014, at 6:02 PM, Michael Jones <mike at proclivis.com> wrote:
> 
>> John,
>> 
>> Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.
>> 
>> Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.
>> 
>> Mike
>> 
>> On Oct 29, 2014, at 5:12 PM, John Lato <jwlato at gmail.com> wrote:
>> 
>>> By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ?  If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.
>>> 
>>> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones <mike at proclivis.com> wrote:
>>> I have a general question about thread behavior in 7.8.3 vs 7.6.X
>>> 
>>> I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.
>>> 
>>> When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.
>>> 
>>> When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.
>>> 
>>> The program is compiled -O2 with profiling. The -N2/4 version uses more memory,  but in both cases with 7.8 and with 7.6 there is no space leak.
>>> 
>>> I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.
>>> 
>>> One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler.
>>> 
>>> What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help?
>>> 
>>> I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.
>>> 
>>> Any hints appreciated.
>>> 
>>> Mike
>>> 
>>> 
>>> _______________________________________________
>>> Glasgow-haskell-users mailing list
>>> Glasgow-haskell-users at haskell.org
>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>> 
>> 
> 
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20150117/749a2651/attachment-0001.html>


More information about the Glasgow-haskell-users mailing list