Cloud Haskell and network latency issues with -threaded
Andreas Voellmy
andreas.voellmy at gmail.com
Fri Feb 8 06:40:18 CET 2013
On Fri, Feb 8, 2013 at 12:30 AM, Edward Z. Yang <ezyang at mit.edu> wrote:
> OK. I think it is high priority for us to get some latency benchmarks
> into nofib so that GHC devs (including me) can start measuring changes
> off them. I know Edsko has some benchmarks here:
> http://www.edsko.net/2013/02/06/performance-problems-with-threaded/
> but they depend on network which makes it a little difficult to move into
> nofib.
> I'm working on other scheduler changes that may help you guys out; we
> should keep each other updated.
>
That would be great :)
>
> I noticed your patch also incorporates the "make yield actually work"
> patch;
> do you think the improvement in 7.4.1 was due to that specific change?
>
Actually, I believe that patch is irrelevant to the scheduler change and
probably should not be in there, strictly speaking. I actually needed that
patch for the IO manager revisions to work properly.
> (Have you instrumented the run queues and checked how your patch changes
> the distribution of jobs over your runtime?)
>
> I didn't do this very rigorously, but I think I added some print
statements in the scheduler and I looked at some eventlogs in threadscope
to see that threads work pushing slows down after a while. I had planned to
write a script to analyze an event log file to extract these stats, but I
never got around to it.
-Andi
> Somewhat unrelatedly, if you have some good latency tests already,
> it may be worth a try compiling your copy of GHC -fno-omit-yields, so that
> forced context switches get serviced more predictably.
>
> Cheers,
> Edward
>
> Excerpts from Andreas Voellmy's message of Thu Feb 07 21:20:25 -0800 2013:
> > Hi Edward,
> >
> > I did two things to improve latency for my application: (1) rework the IO
> > manager and (2) stabilize the work pushing. (1) seems like a big win and
> we
> > are almost done with the work on that part. It is less clear whether (2)
> > will generally help much. It helped me when I developed it against 7.4.1,
> > but it doesn't seem to have much impact on HEAD on the few measurements I
> > did. The idea of (2) was to keep running averages of the run queue length
> > of each capability, then push work when these running averages get too
> > out-of-balance. The desired effect (which seems to work on my particular
> > application) is to avoid cases in which threads are pushed back and forth
> > among cores, which may make cache usage worse. You can see my patch here:
> >
> https://github.com/AndreasVoellmy/ghc-arv/commits/push-work-exchange-squashed
> > .
> >
> > -Andi
> >
> > On Fri, Feb 8, 2013 at 12:10 AM, Edward Z. Yang <ezyang at mit.edu> wrote:
> >
> > > Hey folks,
> > >
> > > The latency changes sound relevant to some work on the scheduler I'm
> doing;
> > > is there a place I can see the changes?
> > >
> > > Thanks,
> > > Edward
> > >
> > > Excerpts from Simon Peyton-Jones's message of Wed Feb 06 10:10:10 -0800
> > > 2013:
> > > > I (with help from Kazu and helpful comments from Bryan and Johan)
> have
> > > nearly completed an overhaul to the IO manager based on my
> observations and
> > > we are in the final stages of getting it into GHC
> > > >
> > > > This is really helpful. Thank you very much Andreas, Kazu, Bryan,
> Johan.
> > > >
> > > > Simon
> > > >
> > > > From: parallel-haskell at googlegroups.com [mailto:
> > > parallel-haskell at googlegroups.com] On Behalf Of Andreas Voellmy
> > > > Sent: 06 February 2013 14:28
> > > > To: watson.timothy at gmail.com
> > > > Cc: kostirya at gmail.com; parallel-haskell;
> > > glasgow-haskell-users at haskell.org
> > > > Subject: Re: Cloud Haskell and network latency issues with -threaded
> > > >
> > > > Hi all,
> > > >
> > > > I haven't followed the conversations around CloudHaskell closely,
> but I
> > > noticed the discussion around latency using the threaded runtime
> system,
> > > and I thought I'd jump in here.
> > > >
> > > > I've been developing a server in Haskell that serves hundreds to
> > > thousands of clients over very long-lived TCP sockets. I also had
> latency
> > > problems with GHC. For example, with 100 clients I had a 10 ms
> > > (millisecond) latency and with 500 clients I had a 29ms latency. I
> looked
> > > into the problem and found that some bottlenecks in the threaded IO
> manager
> > > were the cause. I made some hacks there and got the latency for 100
> and 500
> > > clients down to under 0.2 ms. I (with help from Kazu and helpful
> comments
> > > from Bryan and Johan) have nearly completed an overhaul to the IO
> manager
> > > based on my observations and we are in the final stages of getting it
> into
> > > GHC. Hopefully our work will also fix the latency issues in
> CloudHaskell
> > > programs :)
> > > >
> > > > It would be very helpful if someone has some benchmark CloudHaskell
> > > applications and workloads to test with. Does anyone have these handy?
> > > >
> > > > Cheers,
> > > > Andi
> > > >
> > > > On Wed, Feb 6, 2013 at 9:09 AM, Tim Watson <watson.timothy at gmail.com
> > > <mailto:watson.timothy at gmail.com>> wrote:
> > > > Hi Kostirya,
> > > >
> > > > I'm putting the parallel-haskell and ghc-users lists on cc, just in
> case
> > > other (better informed) folks want to chip in here.
> > > >
> > > > ----
> > > >
> > > > First of all, I'm assuming you're talking about network latency when
> > > compiling with -threaded - if not I apologise for misunderstanding!
> > > >
> > > > There is apparently an outstanding network latency issue when
> compiling
> > > with -threaded, but according to a conversation I had with the other
> > > developers on #haskell-distributed, this is not something that's
> specific
> > > to Cloud Haskell. It is something to do with the threaded runtime
> system,
> > > so would need to be solved for GHC (or is it just the Network
> package!?) in
> > > general. Writing up a simple C program and equivalent socket use in
> Haskell
> > > and comparing the latency using -threaded will show this up.
> > > >
> > > > See the latency section in
> > > http://haskell-distributed.github.com/wiki/networktransport.html for
> some
> > > more details. According to that, there *are* some things we might be
> able
> > > to do, but the 20% latency isn't going to change significantly on the
> face
> > > of things.
> > > >
> > > > We have an open ticket to look into this (
> > > https://cloud-haskell.atlassian.net/browse/NTTCP-4) and at some point
> > > we'll try and put together the sample programs in a github repository
> (if
> > > that's not already done - I might've missed previous spikes done by
> Edsko
> > > or others) and investigate further.
> > > >
> > > > One of the other (more experienced!) devs might be able to chip in
> and
> > > proffer a better explanation.
> > > >
> > > > Cheers,
> > > > Tim
> > > >
> > > > On 6 Feb 2013, at 13:27, kostirya at gmail.com<mailto:
> kostirya at gmail.com>
> > > wrote:
> > > >
> > > > > Haven't you had a necessity to launch Haskell in no-threaded mode
> > > during the intense network data exchange?
> > > > > I am getting the double performance penalty in threaded mode. But I
> > > must use threaded mode because epoll and kevent are available in the
> > > threaded mode only.
> > > > >
> > > >
> > > > [snip]
> > > >
> > > > >
> > > > >
> > > > > среда, 6 февраля 2013 г., 12:33:36 UTC+2 пользователь Tim Watson
> > > написал:
> > > > > Hello all,
> > > > >
> > > > > It's been a busy week for Cloud Haskell and I wanted to share a
> few of
> > > > > our news items with you all.
> > > > >
> > > > > Firstly, we have a new home page at
> > > http://haskell-distributed.github.com,
> > > > > into which most of the documentation and wiki pages have been
> merged.
> > > Making
> > > > > sassy looking websites is not really my bag, so I'm very grateful
> to
> > > the
> > > > > various author's whose Creative Commons licensed designs and
> layouts
> > > made
> > > > > it easy to put together. We've already had some pull requests to
> fix
> > > minor
> > > > > problems on the site, so thanks very much to those who've
> contributed
> > > already!
> > > > >
> > > > > As well as the new site, you will find a few of us hanging out on
> the
> > > > > #haskell-distributed channel on freenode. Please do come along and
> > > join in
> > > > > the conversation.
> > > > >
> > > > > We also recently split up the distributed-process project into
> separate
> > > > > git repositories, one for each component that makes up Cloud
> Haskell.
> > > This
> > > > > was done partly for administrative purposes and partly because
> we're
> > > in the
> > > > > process of setting up CI builds for all the projects.
> > > > >
> > > > > Finally, we've moved from Github's issue tracker to a hosted
> > > Jira/Bamboo setup
> > > > > at https://cloud-haskell.atlassian.net - pull requests are
> naturally
> > > still welcome
> > > > > via Github! Although you can browse issues freely without logging
> in,
> > > you will
> > > > > need to provide an email address and get an account in order to
> submit
> > > new ones.
> > > > > If you have any difficulties logging in, please don't hesitate to
> > > contact me
> > > > > directly, via this forum or the cloud-haskell-developers mailing
> list
> > > (on
> > > > > google groups).
> > > > >
> > > > > As always, we'd be delighted to hear any feedback!
> > > > >
> > > > > Cheers,
> > > > > Tim
> > > >
> > >
>
> --
> You received this message because you are subscribed to the Google Groups
> "parallel-haskell" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to parallel-haskell+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20130208/8e505a99/attachment-0001.htm>
More information about the Glasgow-haskell-users
mailing list