[GHC] #7606: Stride scheduling for Haskell threads with priorities
GHC
cvs-ghc at haskell.org
Fri Jan 18 20:56:25 CET 2013
#7606: Stride scheduling for Haskell threads with priorities
---------------------------------+------------------------------------------
Reporter: ezyang | Owner: ezyang
Type: feature request | Status: new
Priority: normal | Milestone: 7.8.1
Component: Runtime System | Version: 7.7
Keywords: | Os: Unknown/Multiple
Architecture: Unknown/Multiple | Failure: None/Unknown
Difficulty: Unknown | Testcase:
Blockedby: | Blocking:
Related: |
---------------------------------+------------------------------------------
Comment(by ezyang):
Summary: SMP benchmarks show massive degradation of performance, so
probably one of the performance bugs Simon Marlow pointed out is really
killing us and needs to be fixed.
Replying to [comment:1 simonmar]:
> * First, nofib has zero concurrency benchmarks in it, so the results
aren't very meaningful. Still, you seem to have some outliers in there -
any idea why? nofib/smp has some concurrency benchmarks that I normally
use when making changes to the scheduler.
Oops, I forgot that normal nofib doesn't do smp. SMP doesn't look so good,
and it seems likely that some of the performance problems you've noted
below might be contributing. I'll investigate more thoroughly...
{{{
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
callback001 +0.1% +0.0% +7.7% +7.6% +0.0%
callback002 +0.1% +0.0% +10.7% +10.7% +0.0%
chan +0.1% +0.0% -29.0% -29.1% -71.0%
sieve +0.1% +0.0% +141.6% +142.0% +89.3%
threads001 +0.1% +0.0% +12.7% +12.6% +0.0%
threads003 +0.1% +9.1% +44.7% +44.6% +25.8%
threads006 +0.1% +0.0% +30.3% +30.7% +46.6%
threads007 +0.1% -0.2% +17.0% +17.0% -6.8%
--------------------------------------------------------------------------------
Min +0.1% -0.2% -29.0% -29.1% -71.0%
Max +0.1% +9.1% +141.6% +142.0% +89.3%
Geometric Mean +0.1% +1.1% +22.5% +22.5% -0.7%
}}}
> * Run all the tests in `testsuite/concurrent`. They aren't run by
default in a validate.
Yup, I've been running a full testsuite run and will be checking those
results today.
> * The patch adds no less than 4 fields to the TSO struct, which is
quite a lot. We should benchmark carefully to make sure this isn't a
problem.
I can reduce this by one by paying the cost of a division every time we're
scheduled (since stride = STRIDE1/tickets, always); we can reduce it by
one more by observing that pass and remain are never both set at the same
time and turning it into a union. I'm not sure what benchmarks in
particular would start failing due to the increase in size, other than
just general badness.
> * Why stride scheduling rather than something else, e.g. Linux's CFS?
Wouldn't CFS address the issue of giving more time to blocked threads?
What about inaccuracies due to the time it takes for a thread to respond
to the timer signal?
Here's the secret: Linux's CFS is stride scheduling! The primary
differences here are the backing data structure (they use a red-black tree
while we're using a simple heap) and how timeslices are measured (Linux
counts nanoseconds; we stupidly just consider a jump to the scheduler one
quantum.)
It is possible to implement CFS-style sleeper fairness, by modifying what
happens to a processes' pass when it sleeps. I'll post a version with
those changes.
> * Currently, if there's a very long run queue, a minor GC won't have to
traverse a lot of it because it will be in the old generation, but here
the GC has to look at every element of the priority queue. We should make
a benchmark with a long run queue and see if that's an issue.
Could be a problem. If we switched to a tree data structure, we could
probably recover this behavior (the GC will only have to traverse log n).
> * `promoteInRunQueue`: it would be a shame to lose this, but as long as
we're sure that #3838 hasn't regressed it should be ok.
There is no particular reason why we can't implement this, and if we
switch to a red-black tree it gets easier.
> * could we have some more docs on the scheduler functions,
`capPassUpdate` etc. please?
OK.
> * Looks like we should do `annulTSO` in a better way. These fields
should be empty normally, set when a thread is blocked, and unset again
when the thread is unblocked.
OK.
> * I don't see why `sched_mutex` is needed in `setTickets` and co.
Could you explain?
setTickets access can be concurrent, so since a read and a write occur, it
needs to be protected by a lock.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/7606#comment:3>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list