[Haskell-cafe] Re: Joels Time Leak

Tue Jan 3 11:43:21 EST 2006

On 03 January 2006 15:37, Sebastian Sylvan wrote:

> On 1/3/06, Simon Marlow <simonmar at haskell.org> wrote:
>> Tomasz Zielonka wrote:
>>> On Thu, Dec 29, 2005 at 01:20:41PM +0000, Joel Reymont wrote:
>>> 
>>>> Why does it take a fraction of a second for 1 thread to unpickle
>>>> and several seconds per thread for several threads to do it at the
>>>> same time? I think this is where the mistery lies.
>>> 
>>> 
>>> Have you considered any of this:
>>> 
>>> - too big memory pressure: more memory means more frequent and more
>>>   expensive GCs, 1000 threads using so much memory means bad cache 
>>> performance - a deficiency of GHC's thread scheduler - giving too
>>>   much time one thread steals it from others (Simons, don't get
>>>   angry at me - I am probably wrong here ;-)
>> 
>> I don't think there's anything really strange going on here.
>> 
>> The default context switch interval in GHC is 0.02 seconds, measured
>> in CPU time by default.  GHC's scheduler is stricly round-robin, so
>> therefore with 100 threads in the system it can be 2 seconds between
>> a thread being descheduled and scheduled again.
> 
> According to this:
> http://www.haskell.org/ghc/docs/latest/html/users_guide/sec-using-parallel.html#parallel-rts-opts
> 
> The minimum time between context switches is 20 milliseconds.
> 
> Is there any good reason why 0.02 seconds is the best that you can get
> here? Couldn't GHC's internal timer tick at a _much_ faster rate (like
> 50-100µs or so)?

Sure, there's no reason why we couldn't do this.  Of course, even idle Haskell processes will be ticking away in the background, so there's a reason not to make the interval too short.  What do you think is reasonable?

> Apart from meaning big trouble for applications with a large number of
> threads (such as Joels) it'll also make life difficult for any sort of
> real-time application. For instance if you want to use HOpenGL to
> render a simulation engine and you split it up into tons of concurrent
> processes (say one for each dynamic entity in the engine), the 20ms
> granularity would make it quite hard to achieve 60 frames per second
> in that case...

The reason things are the way they are is that a large number of *running* threads is not a workload we've optimised for.  In fact, Joel's program is the first one I've seen with a lot of running threads, apart from our testsuite.  And I suspect that when Joel uses a better binary I/O implementation a lot of that CPU usage will disappear.

Cheers,
	Simon