Thread behavior in 7.8.3

Carter Schonwald carter.schonwald at gmail.com
Wed Jan 21 05:34:44 UTC 2015


i think ben gamari hit similar/related issues with the lib usb bindings in
7.8, and i believe some / all of them are fixed in 7.10
(i could be mixing things up though)

On Tue, Jan 20, 2015 at 10:43 PM, Michael Jones <mike at proclivis.com> wrote:

> Simon,
>
> The code below hangs on the frameEx function.
>
> But, if I change it to:
>
>        f  <- frameCreate objectNull idAny "linti-scope PMBus Scope Tool"
> rectZero (frameDefaultStyle .|. wxMAXIMIZE)
>
> it will progress, but no frame pops up, except once in many tries. Still
> hangs, but progresses through all the setup code.
>
> However, I did make past statements that a non-GUI version was hanging. So
> I am not blaming wxHaskell. Just noting that in this case it is where
> things go wrong.
>
> Anyone,
>
> Are there any wxHaskell experts around that might have some insight?
>
> (Remember, works on single core 32 bit, works on quad core 64 bit, fails
> on 2 core 64 bit. Using GHC 7.8.3. Any recent updates to the code base to
> fix problems like this?)
>
> — CODE SAMPLE --------
>
> gui :: IO ()
> gui
>   = do
>        values <- varCreate []                            -- Values to be
> painted
>        timeLine <- varCreate 0                           -- Line time
>        sample <- varCreate 0                             -- Sample Number
>        running <- varCreate True                         -- True when
> telemetry is active
>
> <<HANG HERE>>
>
>        f <- frameEx frameDefaultStyle [ text := "linti-scope PMBus Scope
> Tool"] objectNull
>
> Setup GUI components code was here
>
>        return ()
>
> go :: IO ()
> go = do
>     putStrLn "Start GUI"
>     start $ gui
>
> exeMain :: IO ()
> exeMain = do
>   hSetBuffering stdout NoBuffering
>   getArgs >>= parse
>   where
>     parse ["-h"] = usage   >> exit
>     parse ["-v"] = version >> exit
>     parse []     = go
>     parse [url, port, session, target] = goServer url port (read session)
> (read target)
>
>     usage   = putStrLn "Usage: linti-scope [url, port, session, target]"
>     version = putStrLn "Haskell linti-scope 0.1.0.0"
>     exit    = System.Exit.exitWith System.Exit.ExitSuccess
>     die     = System.Exit.exitWith (System.Exit.ExitFailure 1)
>
> #ifndef MAIN_FUNCTION
> #define MAIN_FUNCTION exeMain
> #endif
> main = MAIN_FUNCTION
>
> On Jan 20, 2015, at 9:00 AM, Simon Marlow <marlowsd at gmail.com> wrote:
>
> > My guess would be that either
> > - a thread is in a non-allocating loop
> > - a long-running foreign call is marked unsafe
> >
> > Either of these would block the other threads.  ThreadScope together
> with some traceEventIO calls might help you identify the culprit.
> >
> > Cheers,
> > Simon
> >
> > On 20/01/2015 15:49, Michael Jones wrote:
> >> Simon,
> >>
> >> This was fixed some time back. I combed the code base looking for other
> busy loops and there are no more. I commented out the code that runs the
> I2C + Machines + IO stuff, and only left the GUI code. It appears that just
> the wxhaskell part of the program fails to start. This matches a previous
> observation based on printing.
> >>
> >> I’ll see if I can hack up the code to a minimal set that I can publish.
> All the IP is in the I2C code, so I might be able to get it down to one
> file.
> >>
> >> Mike
> >>
> >> On Jan 19, 2015, at 3:37 AM, Simon Marlow <marlowsd at gmail.com> wrote:
> >>
> >>> Hi Michael,
> >>>
> >>> Previously in this thread it was pointed out that your code was doing
> busy waiting, and so the problem can be fixed by modifying your code to not
> do busy waiting.  Did you do this?  The -C flag is just a workaround which
> will make the RTS reschedule more often, it won't fix the underlying
> problem.
> >>>
> >>> The code you showed us was:
> >>>
> >>> sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool
> -> ProcessT m (Spec, String) ()
> >>> sendTransactions dev dts = repeatedly $ do
> >>>  dts' <- liftIO $ atomically $ readTVar dts
> >>>  when (dts' == True) (do
> >>>      (_, transactions) <- await
> >>>      liftIO $ sendOut dev transactions)
> >>>
> >>> This loops when the contents of the TVar is False.
> >>>
> >>> Cheers,
> >>> Simon
> >>>
> >>> On 18/01/2015 01:15, Michael Jones wrote:
> >>>> I have narrowed down the problem a bit. It turns out that many times
> if
> >>>> I run the program and wait long enough, it will start. Given an event
> >>>> log, it may take from 1000-10000 entries sometimes.
> >>>>
> >>>> When I look at a good start vs. slow start, I see that in both cases
> >>>> things startup and there is some thread activity for thread 2 and 3,
> >>>> then the application starts creating other threads, which is when the
> >>>> wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case,
> >>>> it just gets stuck on thread 2/3 activity for a very long time.
> >>>>
> >>>> If I switch from -C0.001 to -C0.010, the startup is more reliable, in
> >>>> that most starts result in an immediate GUI and i2c IO.
> >>>>
> >>>> The behavior suggests to me that some initial threads are starving the
> >>>> ability for other threads to start, and perhaps on a dual core machine
> >>>> it is more of a problem than single or quad core machines. For
> certain,
> >>>> due to some printing, I know that the main thread is starting, and
> that
> >>>> a print just before the first fork is not printing. Code between them
> is
> >>>> evaluating wxhaskell functions, but the main frame is not yet asked to
> >>>> become visible. From last week, I know that an non-gui version of the
> >>>> app is getting stuck, but I do not know if it eventually runs like
> this
> >>>> case.
> >>>>
> >>>> Is there some convention that when I look at an event log you can tell
> >>>> which threads are OS threads vs threads from fork?
> >>>>
> >>>> Perhaps someone that knows the scheduler might have some advice. It
> >>>> seems odd that a scheduler could behave this way. The scheduler should
> >>>> have some built in notion of fairness.
> >>>>
> >>>>
> >>>> On Jan 12, 2015, at 11:02 PM, Michael Jones <mike at proclivis.com
> >>>> <mailto:mike at proclivis.com>> wrote:
> >>>>
> >>>>> Sorry I am reviving an old problem, but it has resurfaced, such that
> >>>>> one system behaves different than another.
> >>>>>
> >>>>> Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on
> >>>>> a single core 32 bit Atom NUC. But on a dual core Atom
> MinnowBoardMax,
> >>>>> something bad is going on. In summary, the same code that runs on two
> >>>>> machines does not run on a third machine. So this indicates I have
> not
> >>>>> made any breaking changes to the code or cabal files. Compiling with
> >>>>> GHC 7.8.3.
> >>>>>
> >>>>> This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1
> >>>>> kernel. It is a dual core 64 bit I86 Atom processor. The application
> >>>>> hangs at startup. If I remove the -C0.00N option and instead use -V0,
> >>>>> the application runs. It has bad timing properties, but it does at
> >>>>> least run. Note that a hang hangs an IO thread talking USB, and the
> >>>>> GUI thread.
> >>>>>
> >>>>> When testing with the -C0.00N option, it did run 2 times out of 20
> >>>>> tries, so fail means fail most but not all of the time. When it did
> >>>>> run, it continued to run properly. This perhaps indicates some kind
> of
> >>>>> internal race condition.
> >>>>>
> >>>>> In the fail to run case, it does some printing up to the point where
> >>>>> it tries to create a wxHaskell frame. In another non-UI version of
> the
> >>>>> program it also fails to run. Logging to a file gives a similar
> >>>>> indication. It is clear that the program starts up, then fails during
> >>>>> the run in some form of lockup, well after the initial startup code.
> >>>>>
> >>>>> If I run with the strace command, it always runs with -C0.00N.
> >>>>>
> >>>>> All the above was done with profiling enabled, so I removed that and
> >>>>> instead enabled eventlog to look for clues.
> >>>>>
> >>>>> In this case it lies between good and bad, in that IO to my USB is
> >>>>> working, but the GUI comes up blank and never paints. Running this
> >>>>> case without -v0 (event log) the gui partially paints and stops, but
> >>>>> USB continues.
> >>>>>
> >>>>> Questions:
> >>>>>
> >>>>> 1) Does ghc 7.8.4 have any improvements that might pertain to these
> >>>>> kinds of scheduling/thread problems?
> >>>>> 2) Is there anything about the nature of a thread using USB, I2C, or
> >>>>> wxHaskell IO that leads to problems that a pure calculation app would
> >>>>> not have?
> >>>>> 3) Any ideas how to track down the problem when changing conditions
> >>>>> (compiler or runtime options) affects behavior?
> >>>>> 4) Are there other options besides -V and -C for the runtime that
> >>>>> might apply?
> >>>>> 5) What does -V0 do that makes a problem program run?
> >>>>>
> >>>>> Mike
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Oct 29, 2014, at 6:02 PM, Michael Jones <mike at proclivis.com
> >>>>> <mailto:mike at proclivis.com>> wrote:
> >>>>>
> >>>>>> John,
> >>>>>>
> >>>>>> Adding -C0.005 makes it much better. Using -C0.001 makes it behave
> >>>>>> more like -N4.
> >>>>>>
> >>>>>> Thanks. This saves my project, as I need to deploy on a single core
> >>>>>> Atom and was stuck.
> >>>>>>
> >>>>>> Mike
> >>>>>>
> >>>>>> On Oct 29, 2014, at 5:12 PM, John Lato <jwlato at gmail.com
> >>>>>> <mailto:jwlato at gmail.com>> wrote:
> >>>>>>
> >>>>>>> By any chance do the delays get shorter if you run your program
> with
> >>>>>>> `+RTS -C0.005` ?  If so, I suspect you're having a problem very
> >>>>>>> similar to one that we had with ghc-7.8 (7.6 too, but it's worse on
> >>>>>>> ghc-7.8 for some reason), involving possible misbehavior of the
> >>>>>>> thread scheduler.
> >>>>>>>
> >>>>>>> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones <mike at proclivis.com
> >>>>>>> <mailto:mike at proclivis.com>> wrote:
> >>>>>>>
> >>>>>>>    I have a general question about thread behavior in 7.8.3 vs
> 7.6.X
> >>>>>>>
> >>>>>>>    I moved from 7.6 to 7.8 and my application behaves very
> >>>>>>>    differently. I have three threads, an application thread that
> >>>>>>>    plots data with wxhaskell or sends it over a network (depends on
> >>>>>>>    settings), a thread doing usb bulk writes, and a thread doing
> >>>>>>>    usb bulk reads. Data is moved around with TChan, and TVar is
> >>>>>>>    used for coordination.
> >>>>>>>
> >>>>>>>    When the application was compiled with 7.6, my stream of usb
> >>>>>>>    traffic was smooth. With 7.8, there are lots of delays where
> >>>>>>>    nothing seems to be running. These delays are up to 40ms,
> >>>>>>>    whereas with 7.6 delays were a 1ms or so.
> >>>>>>>
> >>>>>>>    When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it
> >>>>>>>    runs fine without with -N2/4.
> >>>>>>>
> >>>>>>>    The program is compiled -O2 with profiling. The -N2/4 version
> >>>>>>>    uses more memory,  but in both cases with 7.8 and with 7.6 there
> >>>>>>>    is no space leak.
> >>>>>>>
> >>>>>>>    I tired to compile and use -ls so I could take a look with
> >>>>>>>    threadscope, but the application hangs and writes no data to the
> >>>>>>>    file. The CPU fans run wild like it is in an infinite loop. It
> >>>>>>>    at least pops an unpainted wxhaskell window, so it got partially
> >>>>>>>    running.
> >>>>>>>
> >>>>>>>    One of my libraries uses option -fsimpl-tick-factor=200 to get
> >>>>>>>    around the compiler.
> >>>>>>>
> >>>>>>>    What do I need to know about changes to threading and event
> >>>>>>>    logging between 7.6 and 7.8? Is there some general documentation
> >>>>>>>    somewhere that might help?
> >>>>>>>
> >>>>>>>    I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar
> >>>>>>>    ball and installed myself, after removing 7.6 with apt-get.
> >>>>>>>
> >>>>>>>    Any hints appreciated.
> >>>>>>>
> >>>>>>>    Mike
> >>>>>>>
> >>>>>>>
> >>>>>>>    _______________________________________________
> >>>>>>>    Glasgow-haskell-users mailing list
> >>>>>>>    Glasgow-haskell-users at haskell.org
> >>>>>>>    <mailto:Glasgow-haskell-users at haskell.org>
> >>>>>>>    http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Glasgow-haskell-users mailing list
> >>>>> Glasgow-haskell-users at haskell.org
> >>>>> <mailto:Glasgow-haskell-users at haskell.org>
> >>>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Glasgow-haskell-users mailing list
> >>>> Glasgow-haskell-users at haskell.org
> >>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> >>>>
> >>> _______________________________________________
> >>> Glasgow-haskell-users mailing list
> >>> Glasgow-haskell-users at haskell.org
> >>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> >>
> >>
>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20150121/7da058f4/attachment-0001.html>


More information about the Glasgow-haskell-users mailing list