Thread behavior in 7.8.3

Edward Z. Yang ezyang at mit.edu
Thu Oct 30 00:41:16 UTC 2014


Yes, that's right.

I brought it up because you mentioned that there might still be
occasional delays, and those might be caused by a thread not being
preemptible for a while.

Edward

Excerpts from John Lato's message of 2014-10-29 17:31:45 -0700:
> My understanding is that -fno-omit-yields is subtly different.  I think
> that's for the case when a function loops without performing any heap
> allocations, and thus would never yield even after the context switch
> timeout.  In my case the looping function does perform heap allocations and
> does eventually yield, just not until after the timeout.
> 
> Is that understanding correct?
> 
> (technically, doesn't it change to yielding after stack checks or something
> like that?)
> 
> On Thu, Oct 30, 2014 at 8:24 AM, Edward Z. Yang <ezyang at mit.edu> wrote:
> 
> > I don't think this is directly related to the problem, but if you have a
> > thread that isn't yielding, you can force it to yield by using
> > -fno-omit-yields on your code.  It won't help if the non-yielding code
> > is in a library, and it won't help if the problem was that you just
> > weren't setting timeouts finely enough (which sounds like what was
> > happening). FYI.
> >
> > Edward
> >
> > Excerpts from John Lato's message of 2014-10-29 17:19:46 -0700:
> > > I guess I should explain what that flag does...
> > >
> > > The GHC RTS maintains capabilities, the number of capabilities is
> > specified
> > > by the `+RTS -N` option.  Each capability is a virtual machine that
> > > executes Haskell code, and maintains its own runqueue of threads to
> > process.
> > >
> > > A capability will perform a context switch at the next heap block
> > > allocation (every 4k of allocation) after the timer expires.  The timer
> > > defaults to 20ms, and can be set by the -C flag.  Capabilities perform
> > > context switches in other circumstances as well, such as when a thread
> > > yields or blocks.
> > >
> > > My guess is that either the context switching logic changed in ghc-7.8,
> > or
> > > possibly your code used to trigger a switch via some other mechanism
> > (stack
> > > overflow or something maybe?), but is optimized differently now so
> > instead
> > > it needs to wait for the timer to expire.
> > >
> > > The problem we had was that a time-sensitive thread was getting scheduled
> > > on the same capability as a long-running non-yielding thread, so the
> > > time-sensitive thread had to wait for a context switch timeout (even
> > though
> > > there were free cores available!).  I expect even with -N4 you'll still
> > see
> > > occasional delays (perhaps <5% of calls).
> > >
> > > We've solved our problem with judicious use of `forkOn`, but that won't
> > > help at N1.
> > >
> > > We did see this behavior in 7.6, but it's definitely worse in 7.8.
> > >
> > > Incidentally, has there been any interest in a work-stealing scheduler?
> > > There was a discussion from about 2 years ago, in which Simon Marlow
> > noted
> > > it might be tricky, but it would definitely help in situations like this.
> > >
> > > John L.
> > >
> > > On Thu, Oct 30, 2014 at 8:02 AM, Michael Jones <mike at proclivis.com>
> > wrote:
> > >
> > > > John,
> > > >
> > > > Adding -C0.005 makes it much better. Using -C0.001 makes it behave more
> > > > like -N4.
> > > >
> > > > Thanks. This saves my project, as I need to deploy on a single core
> > Atom
> > > > and was stuck.
> > > >
> > > > Mike
> > > >
> > > > On Oct 29, 2014, at 5:12 PM, John Lato <jwlato at gmail.com> wrote:
> > > >
> > > > By any chance do the delays get shorter if you run your program with
> > `+RTS
> > > > -C0.005` ?  If so, I suspect you're having a problem very similar to
> > one
> > > > that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some
> > > > reason), involving possible misbehavior of the thread scheduler.
> > > >
> > > > On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones <mike at proclivis.com>
> > wrote:
> > > >
> > > >> I have a general question about thread behavior in 7.8.3 vs 7.6.X
> > > >>
> > > >> I moved from 7.6 to 7.8 and my application behaves very differently. I
> > > >> have three threads, an application thread that plots data with
> > wxhaskell or
> > > >> sends it over a network (depends on settings), a thread doing usb bulk
> > > >> writes, and a thread doing usb bulk reads. Data is moved around with
> > TChan,
> > > >> and TVar is used for coordination.
> > > >>
> > > >> When the application was compiled with 7.6, my stream of usb traffic
> > was
> > > >> smooth. With 7.8, there are lots of delays where nothing seems to be
> > > >> running. These delays are up to 40ms, whereas with 7.6 delays were a
> > 1ms or
> > > >> so.
> > > >>
> > > >> When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs
> > fine
> > > >> without with -N2/4.
> > > >>
> > > >> The program is compiled -O2 with profiling. The -N2/4 version uses
> > more
> > > >> memory,  but in both cases with 7.8 and with 7.6 there is no space
> > leak.
> > > >>
> > > >> I tired to compile and use -ls so I could take a look with
> > threadscope,
> > > >> but the application hangs and writes no data to the file. The CPU
> > fans run
> > > >> wild like it is in an infinite loop. It at least pops an unpainted
> > > >> wxhaskell window, so it got partially running.
> > > >>
> > > >> One of my libraries uses option -fsimpl-tick-factor=200 to get around
> > the
> > > >> compiler.
> > > >>
> > > >> What do I need to know about changes to threading and event logging
> > > >> between 7.6 and 7.8? Is there some general documentation somewhere
> > that
> > > >> might help?
> > > >>
> > > >> I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and
> > > >> installed myself, after removing 7.6 with apt-get.
> > > >>
> > > >> Any hints appreciated.
> > > >>
> > > >> Mike
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> Glasgow-haskell-users mailing list
> > > >> Glasgow-haskell-users at haskell.org
> > > >> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> > > >>
> > > >
> > > >
> > > >
> >


More information about the Glasgow-haskell-users mailing list