Thread behavior in 7.8.3

Thu Oct 30 16:36:21 UTC 2014

Hmm, I think maybe part of the problem is in your STM blocks.

On Thu, Oct 30, 2014 at 8:50 AM, Michael Jones <mike at proclivis.com> wrote:

> My hope is that if my threads are doing IO, the scheduler acts when there
> is an IO action with delay, or when STM blocks, etc.
>
> So at the end of my pipe out, I have:
>
> sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool ->
> ProcessT m (Spec, String) ()
> sendTransactions dev dts = repeatedly $ do
>   dts' <- liftIO $ atomically $ readTVar dts
>   when (dts' == True) (do
>       (_, transactions) <- await
>       liftIO $ sendOut dev transactions)
>

When the dts TVar is false, this will just spin, causing the thread to keep
running.  Can you change it to something like:

  _okToProceed <- liftIO $ atomically $ do
      dts' <- readTVar dts
      when (not dts') retry
  (_,transactions) <- await
  liftIO $ sendOut dev transactions

I think this should be better, because now, if dts is False, the STM
transaction will block, allowing the thread to be descheduled.  Furthermore
it won't be rescheduled until another thread has updated dts.

It looks like returnTransactionResults may suffer from the same issue, you
should be able to fix it in a similar manner.

John L.

>
> And my pipe in:
>
> returnTransactionResults :: MonadIO m => SMBusDevice DeviceDC590 -> TVar
> Bool -> SourceT m (Spec, Char)
> returnTransactionResults dev dts = repeatedly $ do
>   (status, spec) <- liftIO $ readIn2 dev
>   oldDts <- liftIO $ atomically $ readTVar dts
>   let dts' = (ord $ status!!1) .&. 0x20
>   let newDts = dts' /= 0
>   when (oldDts /= newDts) (
>     liftIO $ atomically $ writeTVar dts newDts)
>   when (length spec /= 0) (mapM_ (\ch -> yield (executeSpec, ch)) spec)
>
> sendOut will do a usb bulk write, and readIn2 will do a use bulk read.
> Hopefully, somewhere in the usb code IO blocks for an interrupt (probably
> in libusb), and that allows the scheduler to switch threads. Given the
> behavior, I assume this is not the case, and it requires time slicing to
> switch threads.
>
> I also send data between the in/out pipes via TChan. Remembering that each
> pipe is in a thread, hopefully if a readTChan blocks, the scheduler
> reschedules and the other thread runs.
>
> For context, I do a lot of RTOS work, so my worldview of the expected
> behavior comes from that perspective.
>
> Mike
>
>
> On Oct 29, 2014, at 6:41 PM, Edward Z. Yang <ezyang at mit.edu> wrote:
>
> > Yes, that's right.
> >
> > I brought it up because you mentioned that there might still be
> > occasional delays, and those might be caused by a thread not being
> > preemptible for a while.
> >
> > Edward
> >
> > Excerpts from John Lato's message of 2014-10-29 17:31:45 -0700:
> >> My understanding is that -fno-omit-yields is subtly different.  I think
> >> that's for the case when a function loops without performing any heap
> >> allocations, and thus would never yield even after the context switch
> >> timeout.  In my case the looping function does perform heap allocations
> and
> >> does eventually yield, just not until after the timeout.
> >>
> >> Is that understanding correct?
> >>
> >> (technically, doesn't it change to yielding after stack checks or
> something
> >> like that?)
> >>
> >> On Thu, Oct 30, 2014 at 8:24 AM, Edward Z. Yang <ezyang at mit.edu> wrote:
> >>
> >>> I don't think this is directly related to the problem, but if you have
> a
> >>> thread that isn't yielding, you can force it to yield by using
> >>> -fno-omit-yields on your code.  It won't help if the non-yielding code
> >>> is in a library, and it won't help if the problem was that you just
> >>> weren't setting timeouts finely enough (which sounds like what was
> >>> happening). FYI.
> >>>
> >>> Edward
> >>>
> >>> Excerpts from John Lato's message of 2014-10-29 17:19:46 -0700:
> >>>> I guess I should explain what that flag does...
> >>>>
> >>>> The GHC RTS maintains capabilities, the number of capabilities is
> >>> specified
> >>>> by the `+RTS -N` option.  Each capability is a virtual machine that
> >>>> executes Haskell code, and maintains its own runqueue of threads to
> >>> process.
> >>>>
> >>>> A capability will perform a context switch at the next heap block
> >>>> allocation (every 4k of allocation) after the timer expires.  The
> timer
> >>>> defaults to 20ms, and can be set by the -C flag.  Capabilities perform
> >>>> context switches in other circumstances as well, such as when a thread
> >>>> yields or blocks.
> >>>>
> >>>> My guess is that either the context switching logic changed in
> ghc-7.8,
> >>> or
> >>>> possibly your code used to trigger a switch via some other mechanism
> >>> (stack
> >>>> overflow or something maybe?), but is optimized differently now so
> >>> instead
> >>>> it needs to wait for the timer to expire.
> >>>>
> >>>> The problem we had was that a time-sensitive thread was getting
> scheduled
> >>>> on the same capability as a long-running non-yielding thread, so the
> >>>> time-sensitive thread had to wait for a context switch timeout (even
> >>> though
> >>>> there were free cores available!).  I expect even with -N4 you'll
> still
> >>> see
> >>>> occasional delays (perhaps <5% of calls).
> >>>>
> >>>> We've solved our problem with judicious use of `forkOn`, but that
> won't
> >>>> help at N1.
> >>>>
> >>>> We did see this behavior in 7.6, but it's definitely worse in 7.8.
> >>>>
> >>>> Incidentally, has there been any interest in a work-stealing
> scheduler?
> >>>> There was a discussion from about 2 years ago, in which Simon Marlow
> >>> noted
> >>>> it might be tricky, but it would definitely help in situations like
> this.
> >>>>
> >>>> John L.
> >>>>
> >>>> On Thu, Oct 30, 2014 at 8:02 AM, Michael Jones <mike at proclivis.com>
> >>> wrote:
> >>>>
> >>>>> John,
> >>>>>
> >>>>> Adding -C0.005 makes it much better. Using -C0.001 makes it behave
> more
> >>>>> like -N4.
> >>>>>
> >>>>> Thanks. This saves my project, as I need to deploy on a single core
> >>> Atom
> >>>>> and was stuck.
> >>>>>
> >>>>> Mike
> >>>>>
> >>>>> On Oct 29, 2014, at 5:12 PM, John Lato <jwlato at gmail.com> wrote:
> >>>>>
> >>>>> By any chance do the delays get shorter if you run your program with
> >>> `+RTS
> >>>>> -C0.005` ?  If so, I suspect you're having a problem very similar to
> >>> one
> >>>>> that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some
> >>>>> reason), involving possible misbehavior of the thread scheduler.
> >>>>>
> >>>>> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones <mike at proclivis.com>
> >>> wrote:
> >>>>>
> >>>>>> I have a general question about thread behavior in 7.8.3 vs 7.6.X
> >>>>>>
> >>>>>> I moved from 7.6 to 7.8 and my application behaves very
> differently. I
> >>>>>> have three threads, an application thread that plots data with
> >>> wxhaskell or
> >>>>>> sends it over a network (depends on settings), a thread doing usb
> bulk
> >>>>>> writes, and a thread doing usb bulk reads. Data is moved around with
> >>> TChan,
> >>>>>> and TVar is used for coordination.
> >>>>>>
> >>>>>> When the application was compiled with 7.6, my stream of usb traffic
> >>> was
> >>>>>> smooth. With 7.8, there are lots of delays where nothing seems to be
> >>>>>> running. These delays are up to 40ms, whereas with 7.6 delays were a
> >>> 1ms or
> >>>>>> so.
> >>>>>>
> >>>>>> When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs
> >>> fine
> >>>>>> without with -N2/4.
> >>>>>>
> >>>>>> The program is compiled -O2 with profiling. The -N2/4 version uses
> >>> more
> >>>>>> memory,  but in both cases with 7.8 and with 7.6 there is no space
> >>> leak.
> >>>>>>
> >>>>>> I tired to compile and use -ls so I could take a look with
> >>> threadscope,
> >>>>>> but the application hangs and writes no data to the file. The CPU
> >>> fans run
> >>>>>> wild like it is in an infinite loop. It at least pops an unpainted
> >>>>>> wxhaskell window, so it got partially running.
> >>>>>>
> >>>>>> One of my libraries uses option -fsimpl-tick-factor=200 to get
> around
> >>> the
> >>>>>> compiler.
> >>>>>>
> >>>>>> What do I need to know about changes to threading and event logging
> >>>>>> between 7.6 and 7.8? Is there some general documentation somewhere
> >>> that
> >>>>>> might help?
> >>>>>>
> >>>>>> I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball
> and
> >>>>>> installed myself, after removing 7.6 with apt-get.
> >>>>>>
> >>>>>> Any hints appreciated.
> >>>>>>
> >>>>>> Mike
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Glasgow-haskell-users mailing list
> >>>>>> Glasgow-haskell-users at haskell.org
> >>>>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20141030/59c435d7/attachment-0001.html>