Runtime performance degradation for multi-threaded C FFI callback

Sanket Agrawal sanket.agrawal at
Sat Jan 21 16:35:39 CET 2012

Hi Edward,

I was just going to get back to you about it. I did find out that the issue
was indeed one GHC thread dealing with 5 C threads for callback (1:5
mapping) - so, the C threads were blocking on callback waiting for the only
GHC thread to be available. I updated the code to do 1:1 mapping - 5 GHC
threads for 5 C threads. That proved to be almost linearly scalable.

John Latos suggested the above approach two days back, but I didn't get to
test the idea until now.

It doesn't seem to matter whether number of GHC threads are increased, if
the mapping between GHC threads and C threads is not 1:1. I got 1:1 mapping
by doing forkIO for each C thread. Is it really possible to do 7:5 mapping
(that is 7 GHC threads to choose from, for 5 C threads during callback)? I
can't think of a way to do it. Not that I need it. I am just curious if
that is possible.


On Fri, Jan 20, 2012 at 11:16 PM, Edward Z. Yang <ezyang at> wrote:

> Hello Sanket,
> What happens if you run this experiment with 5 threads in the C function,
> and have GHC run RTS with -N7? (e.g. five C threads + seven GHC threads =
> 12
> threads on your 12-core box.)
> Edward
> Excerpts from Sanket Agrawal's message of Tue Jan 17 23:31:38 -0500 2012:
> > I posted this issue on StackOverflow today. A brief recap:
> >
> >  In the case when C FFI calls back a Haskell function, I have observed
> > sharp increase in total time when multi-threading is enabled in C code
> > (even when total number of function calls to Haskell remain same). In my
> > test, I called a Haskell function 5M times using two scenarios (GHC
> 7.0.4,
> > RHEL5, 12-core box):
> >
> >
> >    - Single-threaded C function: call back Haskell function 5M times -
> >    Total time 1.32s
> >    - 5 threads in C function: each thread calls back the Haskell
> function 1M
> >    times - so, total is still 5M - Total time 7.79s - Verified that
> pthread
> >    didn't contribute much to the overhead by having the same code call a
> C
> >    function instead, and compared with single-threaded version. So,
> almost all
> >    of the increase in overhead seems to come from GHC runtime.
> >
> > What I want to ask is if this is a known issue for GHC runtime? If not,
>  I
> > will file a bug report for GHC team with code to reproduce it. I don't
> want
> > to file a duplicate bug report if this is already known issue. I searched
> > through GHC trac using some keywords but didn't see any bugs related to
> it.
> >
> > StackOverflow post link (has code and details on how to reproduce the
> > issue):
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Glasgow-haskell-users mailing list