[Haskell-cafe] Re: Joels Time Leak
Simon Marlow
simonmar at microsoft.com
Tue Jan 3 11:43:21 EST 2006
On 03 January 2006 15:37, Sebastian Sylvan wrote:
> On 1/3/06, Simon Marlow <simonmar at haskell.org> wrote:
>> Tomasz Zielonka wrote:
>>> On Thu, Dec 29, 2005 at 01:20:41PM +0000, Joel Reymont wrote:
>>>
>>>> Why does it take a fraction of a second for 1 thread to unpickle
>>>> and several seconds per thread for several threads to do it at the
>>>> same time? I think this is where the mistery lies.
>>>
>>>
>>> Have you considered any of this:
>>>
>>> - too big memory pressure: more memory means more frequent and more
>>> expensive GCs, 1000 threads using so much memory means bad cache
>>> performance - a deficiency of GHC's thread scheduler - giving too
>>> much time one thread steals it from others (Simons, don't get
>>> angry at me - I am probably wrong here ;-)
>>
>> I don't think there's anything really strange going on here.
>>
>> The default context switch interval in GHC is 0.02 seconds, measured
>> in CPU time by default. GHC's scheduler is stricly round-robin, so
>> therefore with 100 threads in the system it can be 2 seconds between
>> a thread being descheduled and scheduled again.
>
> According to this:
> http://www.haskell.org/ghc/docs/latest/html/users_guide/sec-using-parallel.html#parallel-rts-opts
>
> The minimum time between context switches is 20 milliseconds.
>
> Is there any good reason why 0.02 seconds is the best that you can get
> here? Couldn't GHC's internal timer tick at a _much_ faster rate (like
> 50-100µs or so)?
Sure, there's no reason why we couldn't do this. Of course, even idle Haskell processes will be ticking away in the background, so there's a reason not to make the interval too short. What do you think is reasonable?
> Apart from meaning big trouble for applications with a large number of
> threads (such as Joels) it'll also make life difficult for any sort of
> real-time application. For instance if you want to use HOpenGL to
> render a simulation engine and you split it up into tons of concurrent
> processes (say one for each dynamic entity in the engine), the 20ms
> granularity would make it quite hard to achieve 60 frames per second
> in that case...
The reason things are the way they are is that a large number of *running* threads is not a workload we've optimised for. In fact, Joel's program is the first one I've seen with a lot of running threads, apart from our testsuite. And I suspect that when Joel uses a better binary I/O implementation a lot of that CPU usage will disappear.
Cheers,
Simon
More information about the Haskell-Cafe
mailing list