Thread behavior in 7.8.3

Carter Schonwald carter.schonwald at gmail.com
Wed Jan 21 12:57:41 UTC 2015


woops, forgot to attach the relevant links, (i shouldn't email late at
night :) )

https://github.com/basvandijk/usb/issues/7 is the lib usb matter
https://phabricator.haskell.org/D347

point being: on ghc 7.8, certain hanging behavior from libusb (at least as
of a few months ago) was due to one shotedness

On Wed, Jan 21, 2015 at 5:18 AM, Simon Marlow <marlowsd at gmail.com> wrote:

> On 21/01/2015 03:43, Michael Jones wrote:
>
>> Simon,
>>
>> The code below hangs on the frameEx function.
>>
>> But, if I change it to:
>>
>>         f  <- frameCreate objectNull idAny "linti-scope PMBus Scope Tool"
>> rectZero (frameDefaultStyle .|. wxMAXIMIZE)
>>
>> it will progress, but no frame pops up, except once in many tries. Still
>> hangs, but progresses through all the setup code.
>>
>> However, I did make past statements that a non-GUI version was hanging.
>> So I am not blaming wxHaskell. Just noting that in this case it is where
>> things go wrong.
>>
>> Anyone,
>>
>> Are there any wxHaskell experts around that might have some insight?
>>
>> (Remember, works on single core 32 bit, works on quad core 64 bit,
>> fails on 2 core 64 bit. Using GHC 7.8.3. Any recent updates to the
>> code base to fix problems like this?)
>>
>
> No, there are no recently fixed or outstanding bugs in this area that I'm
> aware of.
>
> From the symptoms I strongly suspect there's an unsafe foreign call
> somewhere causing problems, or another busy-wait loop.
>
> Cheers,
> Simon
>
>
>
>
>> — CODE SAMPLE --------
>>
>> gui :: IO ()
>> gui
>>    = do
>>         values <- varCreate []                            -- Values to be
>> painted
>>         timeLine <- varCreate 0                           -- Line time
>>         sample <- varCreate 0                             -- Sample Number
>>         running <- varCreate True                         -- True when
>> telemetry is active
>>
>> <<HANG HERE>>
>>
>>         f <- frameEx frameDefaultStyle [ text := "linti-scope PMBus Scope
>> Tool"] objectNull
>>
>> Setup GUI components code was here
>>
>>         return ()
>>
>> go :: IO ()
>> go = do
>>      putStrLn "Start GUI"
>>      start $ gui
>>
>> exeMain :: IO ()
>> exeMain = do
>>    hSetBuffering stdout NoBuffering
>>    getArgs >>= parse
>>    where
>>      parse ["-h"] = usage   >> exit
>>      parse ["-v"] = version >> exit
>>      parse []     = go
>>      parse [url, port, session, target] = goServer url port (read
>> session) (read target)
>>
>>      usage   = putStrLn "Usage: linti-scope [url, port, session, target]"
>>      version = putStrLn "Haskell linti-scope 0.1.0.0"
>>      exit    = System.Exit.exitWith System.Exit.ExitSuccess
>>      die     = System.Exit.exitWith (System.Exit.ExitFailure 1)
>>
>> #ifndef MAIN_FUNCTION
>> #define MAIN_FUNCTION exeMain
>> #endif
>> main = MAIN_FUNCTION
>>
>> On Jan 20, 2015, at 9:00 AM, Simon Marlow <marlowsd at gmail.com> wrote:
>>
>>  My guess would be that either
>>> - a thread is in a non-allocating loop
>>> - a long-running foreign call is marked unsafe
>>>
>>> Either of these would block the other threads.  ThreadScope together
>>> with some traceEventIO calls might help you identify the culprit.
>>>
>>> Cheers,
>>> Simon
>>>
>>> On 20/01/2015 15:49, Michael Jones wrote:
>>>
>>>> Simon,
>>>>
>>>> This was fixed some time back. I combed the code base looking for other
>>>> busy loops and there are no more. I commented out the code that runs the
>>>> I2C + Machines + IO stuff, and only left the GUI code. It appears that just
>>>> the wxhaskell part of the program fails to start. This matches a previous
>>>> observation based on printing.
>>>>
>>>> I’ll see if I can hack up the code to a minimal set that I can publish.
>>>> All the IP is in the I2C code, so I might be able to get it down to one
>>>> file.
>>>>
>>>> Mike
>>>>
>>>> On Jan 19, 2015, at 3:37 AM, Simon Marlow <marlowsd at gmail.com> wrote:
>>>>
>>>>  Hi Michael,
>>>>>
>>>>> Previously in this thread it was pointed out that your code was doing
>>>>> busy waiting, and so the problem can be fixed by modifying your code to not
>>>>> do busy waiting.  Did you do this?  The -C flag is just a workaround which
>>>>> will make the RTS reschedule more often, it won't fix the underlying
>>>>> problem.
>>>>>
>>>>> The code you showed us was:
>>>>>
>>>>> sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool
>>>>> -> ProcessT m (Spec, String) ()
>>>>> sendTransactions dev dts = repeatedly $ do
>>>>>   dts' <- liftIO $ atomically $ readTVar dts
>>>>>   when (dts' == True) (do
>>>>>       (_, transactions) <- await
>>>>>       liftIO $ sendOut dev transactions)
>>>>>
>>>>> This loops when the contents of the TVar is False.
>>>>>
>>>>> Cheers,
>>>>> Simon
>>>>>
>>>>> On 18/01/2015 01:15, Michael Jones wrote:
>>>>>
>>>>>> I have narrowed down the problem a bit. It turns out that many times
>>>>>> if
>>>>>> I run the program and wait long enough, it will start. Given an event
>>>>>> log, it may take from 1000-10000 entries sometimes.
>>>>>>
>>>>>> When I look at a good start vs. slow start, I see that in both cases
>>>>>> things startup and there is some thread activity for thread 2 and 3,
>>>>>> then the application starts creating other threads, which is when the
>>>>>> wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case,
>>>>>> it just gets stuck on thread 2/3 activity for a very long time.
>>>>>>
>>>>>> If I switch from -C0.001 to -C0.010, the startup is more reliable, in
>>>>>> that most starts result in an immediate GUI and i2c IO.
>>>>>>
>>>>>> The behavior suggests to me that some initial threads are starving the
>>>>>> ability for other threads to start, and perhaps on a dual core machine
>>>>>> it is more of a problem than single or quad core machines. For
>>>>>> certain,
>>>>>> due to some printing, I know that the main thread is starting, and
>>>>>> that
>>>>>> a print just before the first fork is not printing. Code between them
>>>>>> is
>>>>>> evaluating wxhaskell functions, but the main frame is not yet asked to
>>>>>> become visible. From last week, I know that an non-gui version of the
>>>>>> app is getting stuck, but I do not know if it eventually runs like
>>>>>> this
>>>>>> case.
>>>>>>
>>>>>> Is there some convention that when I look at an event log you can tell
>>>>>> which threads are OS threads vs threads from fork?
>>>>>>
>>>>>> Perhaps someone that knows the scheduler might have some advice. It
>>>>>> seems odd that a scheduler could behave this way. The scheduler should
>>>>>> have some built in notion of fairness.
>>>>>>
>>>>>>
>>>>>> On Jan 12, 2015, at 11:02 PM, Michael Jones <mike at proclivis.com
>>>>>> <mailto:mike at proclivis.com>> wrote:
>>>>>>
>>>>>>  Sorry I am reviving an old problem, but it has resurfaced, such that
>>>>>>> one system behaves different than another.
>>>>>>>
>>>>>>> Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on
>>>>>>> a single core 32 bit Atom NUC. But on a dual core Atom
>>>>>>> MinnowBoardMax,
>>>>>>> something bad is going on. In summary, the same code that runs on two
>>>>>>> machines does not run on a third machine. So this indicates I have
>>>>>>> not
>>>>>>> made any breaking changes to the code or cabal files. Compiling with
>>>>>>> GHC 7.8.3.
>>>>>>>
>>>>>>> This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1
>>>>>>> kernel. It is a dual core 64 bit I86 Atom processor. The application
>>>>>>> hangs at startup. If I remove the -C0.00N option and instead use -V0,
>>>>>>> the application runs. It has bad timing properties, but it does at
>>>>>>> least run. Note that a hang hangs an IO thread talking USB, and the
>>>>>>> GUI thread.
>>>>>>>
>>>>>>> When testing with the -C0.00N option, it did run 2 times out of 20
>>>>>>> tries, so fail means fail most but not all of the time. When it did
>>>>>>> run, it continued to run properly. This perhaps indicates some kind
>>>>>>> of
>>>>>>> internal race condition.
>>>>>>>
>>>>>>> In the fail to run case, it does some printing up to the point where
>>>>>>> it tries to create a wxHaskell frame. In another non-UI version of
>>>>>>> the
>>>>>>> program it also fails to run. Logging to a file gives a similar
>>>>>>> indication. It is clear that the program starts up, then fails during
>>>>>>> the run in some form of lockup, well after the initial startup code.
>>>>>>>
>>>>>>> If I run with the strace command, it always runs with -C0.00N.
>>>>>>>
>>>>>>> All the above was done with profiling enabled, so I removed that and
>>>>>>> instead enabled eventlog to look for clues.
>>>>>>>
>>>>>>> In this case it lies between good and bad, in that IO to my USB is
>>>>>>> working, but the GUI comes up blank and never paints. Running this
>>>>>>> case without -v0 (event log) the gui partially paints and stops, but
>>>>>>> USB continues.
>>>>>>>
>>>>>>> Questions:
>>>>>>>
>>>>>>> 1) Does ghc 7.8.4 have any improvements that might pertain to these
>>>>>>> kinds of scheduling/thread problems?
>>>>>>> 2) Is there anything about the nature of a thread using USB, I2C, or
>>>>>>> wxHaskell IO that leads to problems that a pure calculation app would
>>>>>>> not have?
>>>>>>> 3) Any ideas how to track down the problem when changing conditions
>>>>>>> (compiler or runtime options) affects behavior?
>>>>>>> 4) Are there other options besides -V and -C for the runtime that
>>>>>>> might apply?
>>>>>>> 5) What does -V0 do that makes a problem program run?
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 29, 2014, at 6:02 PM, Michael Jones <mike at proclivis.com
>>>>>>> <mailto:mike at proclivis.com>> wrote:
>>>>>>>
>>>>>>>  John,
>>>>>>>>
>>>>>>>> Adding -C0.005 makes it much better. Using -C0.001 makes it behave
>>>>>>>> more like -N4.
>>>>>>>>
>>>>>>>> Thanks. This saves my project, as I need to deploy on a single core
>>>>>>>> Atom and was stuck.
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> On Oct 29, 2014, at 5:12 PM, John Lato <jwlato at gmail.com
>>>>>>>> <mailto:jwlato at gmail.com>> wrote:
>>>>>>>>
>>>>>>>>  By any chance do the delays get shorter if you run your program
>>>>>>>>> with
>>>>>>>>> `+RTS -C0.005` ?  If so, I suspect you're having a problem very
>>>>>>>>> similar to one that we had with ghc-7.8 (7.6 too, but it's worse on
>>>>>>>>> ghc-7.8 for some reason), involving possible misbehavior of the
>>>>>>>>> thread scheduler.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones <mike at proclivis.com
>>>>>>>>> <mailto:mike at proclivis.com>> wrote:
>>>>>>>>>
>>>>>>>>>     I have a general question about thread behavior in 7.8.3 vs
>>>>>>>>> 7.6.X
>>>>>>>>>
>>>>>>>>>     I moved from 7.6 to 7.8 and my application behaves very
>>>>>>>>>     differently. I have three threads, an application thread that
>>>>>>>>>     plots data with wxhaskell or sends it over a network (depends
>>>>>>>>> on
>>>>>>>>>     settings), a thread doing usb bulk writes, and a thread doing
>>>>>>>>>     usb bulk reads. Data is moved around with TChan, and TVar is
>>>>>>>>>     used for coordination.
>>>>>>>>>
>>>>>>>>>     When the application was compiled with 7.6, my stream of usb
>>>>>>>>>     traffic was smooth. With 7.8, there are lots of delays where
>>>>>>>>>     nothing seems to be running. These delays are up to 40ms,
>>>>>>>>>     whereas with 7.6 delays were a 1ms or so.
>>>>>>>>>
>>>>>>>>>     When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it
>>>>>>>>>     runs fine without with -N2/4.
>>>>>>>>>
>>>>>>>>>     The program is compiled -O2 with profiling. The -N2/4 version
>>>>>>>>>     uses more memory,  but in both cases with 7.8 and with 7.6
>>>>>>>>> there
>>>>>>>>>     is no space leak.
>>>>>>>>>
>>>>>>>>>     I tired to compile and use -ls so I could take a look with
>>>>>>>>>     threadscope, but the application hangs and writes no data to
>>>>>>>>> the
>>>>>>>>>     file. The CPU fans run wild like it is in an infinite loop. It
>>>>>>>>>     at least pops an unpainted wxhaskell window, so it got
>>>>>>>>> partially
>>>>>>>>>     running.
>>>>>>>>>
>>>>>>>>>     One of my libraries uses option -fsimpl-tick-factor=200 to get
>>>>>>>>>     around the compiler.
>>>>>>>>>
>>>>>>>>>     What do I need to know about changes to threading and event
>>>>>>>>>     logging between 7.6 and 7.8? Is there some general
>>>>>>>>> documentation
>>>>>>>>>     somewhere that might help?
>>>>>>>>>
>>>>>>>>>     I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar
>>>>>>>>>     ball and installed myself, after removing 7.6 with apt-get.
>>>>>>>>>
>>>>>>>>>     Any hints appreciated.
>>>>>>>>>
>>>>>>>>>     Mike
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     _______________________________________________
>>>>>>>>>     Glasgow-haskell-users mailing list
>>>>>>>>>     Glasgow-haskell-users at haskell.org
>>>>>>>>>     <mailto:Glasgow-haskell-users at haskell.org>
>>>>>>>>>     http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Glasgow-haskell-users mailing list
>>>>>>> Glasgow-haskell-users at haskell.org
>>>>>>> <mailto:Glasgow-haskell-users at haskell.org>
>>>>>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Glasgow-haskell-users mailing list
>>>>>> Glasgow-haskell-users at haskell.org
>>>>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>>>>>
>>>>>>  _______________________________________________
>>>>> Glasgow-haskell-users mailing list
>>>>> Glasgow-haskell-users at haskell.org
>>>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>>>>
>>>>
>>>>
>>>>
>>
>>  _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20150121/c3075b08/attachment-0001.html>


More information about the Glasgow-haskell-users mailing list