Thread behavior in 7.8.3
Carter Schonwald
carter.schonwald at gmail.com
Wed Jan 21 05:34:44 UTC 2015
i think ben gamari hit similar/related issues with the lib usb bindings in
7.8, and i believe some / all of them are fixed in 7.10
(i could be mixing things up though)
On Tue, Jan 20, 2015 at 10:43 PM, Michael Jones <mike at proclivis.com> wrote:
> Simon,
>
> The code below hangs on the frameEx function.
>
> But, if I change it to:
>
> f <- frameCreate objectNull idAny "linti-scope PMBus Scope Tool"
> rectZero (frameDefaultStyle .|. wxMAXIMIZE)
>
> it will progress, but no frame pops up, except once in many tries. Still
> hangs, but progresses through all the setup code.
>
> However, I did make past statements that a non-GUI version was hanging. So
> I am not blaming wxHaskell. Just noting that in this case it is where
> things go wrong.
>
> Anyone,
>
> Are there any wxHaskell experts around that might have some insight?
>
> (Remember, works on single core 32 bit, works on quad core 64 bit, fails
> on 2 core 64 bit. Using GHC 7.8.3. Any recent updates to the code base to
> fix problems like this?)
>
> — CODE SAMPLE --------
>
> gui :: IO ()
> gui
> = do
> values <- varCreate [] -- Values to be
> painted
> timeLine <- varCreate 0 -- Line time
> sample <- varCreate 0 -- Sample Number
> running <- varCreate True -- True when
> telemetry is active
>
> <<HANG HERE>>
>
> f <- frameEx frameDefaultStyle [ text := "linti-scope PMBus Scope
> Tool"] objectNull
>
> Setup GUI components code was here
>
> return ()
>
> go :: IO ()
> go = do
> putStrLn "Start GUI"
> start $ gui
>
> exeMain :: IO ()
> exeMain = do
> hSetBuffering stdout NoBuffering
> getArgs >>= parse
> where
> parse ["-h"] = usage >> exit
> parse ["-v"] = version >> exit
> parse [] = go
> parse [url, port, session, target] = goServer url port (read session)
> (read target)
>
> usage = putStrLn "Usage: linti-scope [url, port, session, target]"
> version = putStrLn "Haskell linti-scope 0.1.0.0"
> exit = System.Exit.exitWith System.Exit.ExitSuccess
> die = System.Exit.exitWith (System.Exit.ExitFailure 1)
>
> #ifndef MAIN_FUNCTION
> #define MAIN_FUNCTION exeMain
> #endif
> main = MAIN_FUNCTION
>
> On Jan 20, 2015, at 9:00 AM, Simon Marlow <marlowsd at gmail.com> wrote:
>
> > My guess would be that either
> > - a thread is in a non-allocating loop
> > - a long-running foreign call is marked unsafe
> >
> > Either of these would block the other threads. ThreadScope together
> with some traceEventIO calls might help you identify the culprit.
> >
> > Cheers,
> > Simon
> >
> > On 20/01/2015 15:49, Michael Jones wrote:
> >> Simon,
> >>
> >> This was fixed some time back. I combed the code base looking for other
> busy loops and there are no more. I commented out the code that runs the
> I2C + Machines + IO stuff, and only left the GUI code. It appears that just
> the wxhaskell part of the program fails to start. This matches a previous
> observation based on printing.
> >>
> >> I’ll see if I can hack up the code to a minimal set that I can publish.
> All the IP is in the I2C code, so I might be able to get it down to one
> file.
> >>
> >> Mike
> >>
> >> On Jan 19, 2015, at 3:37 AM, Simon Marlow <marlowsd at gmail.com> wrote:
> >>
> >>> Hi Michael,
> >>>
> >>> Previously in this thread it was pointed out that your code was doing
> busy waiting, and so the problem can be fixed by modifying your code to not
> do busy waiting. Did you do this? The -C flag is just a workaround which
> will make the RTS reschedule more often, it won't fix the underlying
> problem.
> >>>
> >>> The code you showed us was:
> >>>
> >>> sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool
> -> ProcessT m (Spec, String) ()
> >>> sendTransactions dev dts = repeatedly $ do
> >>> dts' <- liftIO $ atomically $ readTVar dts
> >>> when (dts' == True) (do
> >>> (_, transactions) <- await
> >>> liftIO $ sendOut dev transactions)
> >>>
> >>> This loops when the contents of the TVar is False.
> >>>
> >>> Cheers,
> >>> Simon
> >>>
> >>> On 18/01/2015 01:15, Michael Jones wrote:
> >>>> I have narrowed down the problem a bit. It turns out that many times
> if
> >>>> I run the program and wait long enough, it will start. Given an event
> >>>> log, it may take from 1000-10000 entries sometimes.
> >>>>
> >>>> When I look at a good start vs. slow start, I see that in both cases
> >>>> things startup and there is some thread activity for thread 2 and 3,
> >>>> then the application starts creating other threads, which is when the
> >>>> wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case,
> >>>> it just gets stuck on thread 2/3 activity for a very long time.
> >>>>
> >>>> If I switch from -C0.001 to -C0.010, the startup is more reliable, in
> >>>> that most starts result in an immediate GUI and i2c IO.
> >>>>
> >>>> The behavior suggests to me that some initial threads are starving the
> >>>> ability for other threads to start, and perhaps on a dual core machine
> >>>> it is more of a problem than single or quad core machines. For
> certain,
> >>>> due to some printing, I know that the main thread is starting, and
> that
> >>>> a print just before the first fork is not printing. Code between them
> is
> >>>> evaluating wxhaskell functions, but the main frame is not yet asked to
> >>>> become visible. From last week, I know that an non-gui version of the
> >>>> app is getting stuck, but I do not know if it eventually runs like
> this
> >>>> case.
> >>>>
> >>>> Is there some convention that when I look at an event log you can tell
> >>>> which threads are OS threads vs threads from fork?
> >>>>
> >>>> Perhaps someone that knows the scheduler might have some advice. It
> >>>> seems odd that a scheduler could behave this way. The scheduler should
> >>>> have some built in notion of fairness.
> >>>>
> >>>>
> >>>> On Jan 12, 2015, at 11:02 PM, Michael Jones <mike at proclivis.com
> >>>> <mailto:mike at proclivis.com>> wrote:
> >>>>
> >>>>> Sorry I am reviving an old problem, but it has resurfaced, such that
> >>>>> one system behaves different than another.
> >>>>>
> >>>>> Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on
> >>>>> a single core 32 bit Atom NUC. But on a dual core Atom
> MinnowBoardMax,
> >>>>> something bad is going on. In summary, the same code that runs on two
> >>>>> machines does not run on a third machine. So this indicates I have
> not
> >>>>> made any breaking changes to the code or cabal files. Compiling with
> >>>>> GHC 7.8.3.
> >>>>>
> >>>>> This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1
> >>>>> kernel. It is a dual core 64 bit I86 Atom processor. The application
> >>>>> hangs at startup. If I remove the -C0.00N option and instead use -V0,
> >>>>> the application runs. It has bad timing properties, but it does at
> >>>>> least run. Note that a hang hangs an IO thread talking USB, and the
> >>>>> GUI thread.
> >>>>>
> >>>>> When testing with the -C0.00N option, it did run 2 times out of 20
> >>>>> tries, so fail means fail most but not all of the time. When it did
> >>>>> run, it continued to run properly. This perhaps indicates some kind
> of
> >>>>> internal race condition.
> >>>>>
> >>>>> In the fail to run case, it does some printing up to the point where
> >>>>> it tries to create a wxHaskell frame. In another non-UI version of
> the
> >>>>> program it also fails to run. Logging to a file gives a similar
> >>>>> indication. It is clear that the program starts up, then fails during
> >>>>> the run in some form of lockup, well after the initial startup code.
> >>>>>
> >>>>> If I run with the strace command, it always runs with -C0.00N.
> >>>>>
> >>>>> All the above was done with profiling enabled, so I removed that and
> >>>>> instead enabled eventlog to look for clues.
> >>>>>
> >>>>> In this case it lies between good and bad, in that IO to my USB is
> >>>>> working, but the GUI comes up blank and never paints. Running this
> >>>>> case without -v0 (event log) the gui partially paints and stops, but
> >>>>> USB continues.
> >>>>>
> >>>>> Questions:
> >>>>>
> >>>>> 1) Does ghc 7.8.4 have any improvements that might pertain to these
> >>>>> kinds of scheduling/thread problems?
> >>>>> 2) Is there anything about the nature of a thread using USB, I2C, or
> >>>>> wxHaskell IO that leads to problems that a pure calculation app would
> >>>>> not have?
> >>>>> 3) Any ideas how to track down the problem when changing conditions
> >>>>> (compiler or runtime options) affects behavior?
> >>>>> 4) Are there other options besides -V and -C for the runtime that
> >>>>> might apply?
> >>>>> 5) What does -V0 do that makes a problem program run?
> >>>>>
> >>>>> Mike
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Oct 29, 2014, at 6:02 PM, Michael Jones <mike at proclivis.com
> >>>>> <mailto:mike at proclivis.com>> wrote:
> >>>>>
> >>>>>> John,
> >>>>>>
> >>>>>> Adding -C0.005 makes it much better. Using -C0.001 makes it behave
> >>>>>> more like -N4.
> >>>>>>
> >>>>>> Thanks. This saves my project, as I need to deploy on a single core
> >>>>>> Atom and was stuck.
> >>>>>>
> >>>>>> Mike
> >>>>>>
> >>>>>> On Oct 29, 2014, at 5:12 PM, John Lato <jwlato at gmail.com
> >>>>>> <mailto:jwlato at gmail.com>> wrote:
> >>>>>>
> >>>>>>> By any chance do the delays get shorter if you run your program
> with
> >>>>>>> `+RTS -C0.005` ? If so, I suspect you're having a problem very
> >>>>>>> similar to one that we had with ghc-7.8 (7.6 too, but it's worse on
> >>>>>>> ghc-7.8 for some reason), involving possible misbehavior of the
> >>>>>>> thread scheduler.
> >>>>>>>
> >>>>>>> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones <mike at proclivis.com
> >>>>>>> <mailto:mike at proclivis.com>> wrote:
> >>>>>>>
> >>>>>>> I have a general question about thread behavior in 7.8.3 vs
> 7.6.X
> >>>>>>>
> >>>>>>> I moved from 7.6 to 7.8 and my application behaves very
> >>>>>>> differently. I have three threads, an application thread that
> >>>>>>> plots data with wxhaskell or sends it over a network (depends on
> >>>>>>> settings), a thread doing usb bulk writes, and a thread doing
> >>>>>>> usb bulk reads. Data is moved around with TChan, and TVar is
> >>>>>>> used for coordination.
> >>>>>>>
> >>>>>>> When the application was compiled with 7.6, my stream of usb
> >>>>>>> traffic was smooth. With 7.8, there are lots of delays where
> >>>>>>> nothing seems to be running. These delays are up to 40ms,
> >>>>>>> whereas with 7.6 delays were a 1ms or so.
> >>>>>>>
> >>>>>>> When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it
> >>>>>>> runs fine without with -N2/4.
> >>>>>>>
> >>>>>>> The program is compiled -O2 with profiling. The -N2/4 version
> >>>>>>> uses more memory, but in both cases with 7.8 and with 7.6 there
> >>>>>>> is no space leak.
> >>>>>>>
> >>>>>>> I tired to compile and use -ls so I could take a look with
> >>>>>>> threadscope, but the application hangs and writes no data to the
> >>>>>>> file. The CPU fans run wild like it is in an infinite loop. It
> >>>>>>> at least pops an unpainted wxhaskell window, so it got partially
> >>>>>>> running.
> >>>>>>>
> >>>>>>> One of my libraries uses option -fsimpl-tick-factor=200 to get
> >>>>>>> around the compiler.
> >>>>>>>
> >>>>>>> What do I need to know about changes to threading and event
> >>>>>>> logging between 7.6 and 7.8? Is there some general documentation
> >>>>>>> somewhere that might help?
> >>>>>>>
> >>>>>>> I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar
> >>>>>>> ball and installed myself, after removing 7.6 with apt-get.
> >>>>>>>
> >>>>>>> Any hints appreciated.
> >>>>>>>
> >>>>>>> Mike
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Glasgow-haskell-users mailing list
> >>>>>>> Glasgow-haskell-users at haskell.org
> >>>>>>> <mailto:Glasgow-haskell-users at haskell.org>
> >>>>>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Glasgow-haskell-users mailing list
> >>>>> Glasgow-haskell-users at haskell.org
> >>>>> <mailto:Glasgow-haskell-users at haskell.org>
> >>>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Glasgow-haskell-users mailing list
> >>>> Glasgow-haskell-users at haskell.org
> >>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> >>>>
> >>> _______________________________________________
> >>> Glasgow-haskell-users mailing list
> >>> Glasgow-haskell-users at haskell.org
> >>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> >>
> >>
>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20150121/7da058f4/attachment-0001.html>
More information about the Glasgow-haskell-users
mailing list