FFI calls: is it possible to allocate a small memory block on a stack?

Tue Apr 27 06:59:44 EDT 2010

On 23/04/2010 19:03, Denys Rtveliashvili wrote:

>> Tue Dec  1 16:03:21 GMT 2009  Simon Marlow<marlowsd at gmail.com  <mailto:marlowsd at gmail.com>>
>>       * Make allocatePinned use local storage, and other refactorings
>
> The version I have checked out is 6.12 and that's why I haven't seen
> this patch.
> Are there any plans for including this patch in the next GHC release?

It'll be in the next major release (6.14.1).

>> Right, but these are not common cases that need to be optimised.  newCAF
>> is only called once per CAF, thereafter it is accessed without locks.
> Can't recall from the top of my head, but I think I had a case when
> newCAF was used very actively in a simple piece of code. The code looked
> like this:
>
> sequence_ $ replicate N $ doSmth
>
> The Cmm code showed that it produced calls to newCAF and something
> related to black holes.

Right, but newCAF should only be called once for any given CAF, 
thereafter the CAF will have been updated.

> And when I added "return ()" after that line,
> the black holes new calls to "newCAF" have disappeared. It was on
> 6.12.1, I believe. I still have no idea why it happened and why these
> black holes where necessary, but I'll try to reproduce it one more time
> and show you an example if it has any interest for you.

If you find a case where newCAF is being called repeatedly, that would 
be interesting yes.

>> It may be that we could find benchmarks where access to the block
>> allocator is the performance bottleneck, indeed in the parallel GC we
>> sometimes see contention for it.  If that turns out to be a problem then
>> we may need to think about per-CPU free lists in the block allocator,
>> but I think it would entail a fair bit of complexity and if we're not
>> careful extra memory overhead, e.g. where one CPU has all the free
>> blocks in its local free list and the others have none.  So I'd like to
>> avoid going down that route unless we absolutely have to.  The block
>> allocator is nice and simple right now.
>
> I suppose I should check out the HEAD then and give it a try, because
> earlier I had performance issues in the threaded runtime (~20% of
> overhead and far more noise) in an application which was doing some
> slicing, reshuffling and composing text via ByteStrings with a modest
> amount of passing data around via "Chan"s.

I'd be interested in seeing a program that has 20% overhead with 
-threaded.  You should watch out for bound threads though: with 
-threaded the main thread is a bound thread, and communication with the 
main thread is much slower than between unbound threads. See

http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.1/Control-Concurrent.html#8

> On a slightly different topic: please could you point me to a place
> where stg_upd_frame_info is generated? I can't find it in *.c, *.cmm or
> *.hs and guess it is something very special.

rts/Updates.cmm:

INFO_TABLE_RET( stg_upd_frame, UPDATE_FRAME, UPD_FRAME_PARAMS)
{
...
}

Cheers,
	Simon