FFI calls: is it possible to allocate a small memory block on
a stack?
Simon Marlow
marlowsd at gmail.com
Tue Apr 27 06:59:44 EDT 2010
On 23/04/2010 19:03, Denys Rtveliashvili wrote:
>> Tue Dec 1 16:03:21 GMT 2009 Simon Marlow<marlowsd at gmail.com <mailto:marlowsd at gmail.com>>
>> * Make allocatePinned use local storage, and other refactorings
>
> The version I have checked out is 6.12 and that's why I haven't seen
> this patch.
> Are there any plans for including this patch in the next GHC release?
It'll be in the next major release (6.14.1).
>> Right, but these are not common cases that need to be optimised. newCAF
>> is only called once per CAF, thereafter it is accessed without locks.
> Can't recall from the top of my head, but I think I had a case when
> newCAF was used very actively in a simple piece of code. The code looked
> like this:
>
> sequence_ $ replicate N $ doSmth
>
> The Cmm code showed that it produced calls to newCAF and something
> related to black holes.
Right, but newCAF should only be called once for any given CAF,
thereafter the CAF will have been updated.
> And when I added "return ()" after that line,
> the black holes new calls to "newCAF" have disappeared. It was on
> 6.12.1, I believe. I still have no idea why it happened and why these
> black holes where necessary, but I'll try to reproduce it one more time
> and show you an example if it has any interest for you.
If you find a case where newCAF is being called repeatedly, that would
be interesting yes.
>> It may be that we could find benchmarks where access to the block
>> allocator is the performance bottleneck, indeed in the parallel GC we
>> sometimes see contention for it. If that turns out to be a problem then
>> we may need to think about per-CPU free lists in the block allocator,
>> but I think it would entail a fair bit of complexity and if we're not
>> careful extra memory overhead, e.g. where one CPU has all the free
>> blocks in its local free list and the others have none. So I'd like to
>> avoid going down that route unless we absolutely have to. The block
>> allocator is nice and simple right now.
>
> I suppose I should check out the HEAD then and give it a try, because
> earlier I had performance issues in the threaded runtime (~20% of
> overhead and far more noise) in an application which was doing some
> slicing, reshuffling and composing text via ByteStrings with a modest
> amount of passing data around via "Chan"s.
I'd be interested in seeing a program that has 20% overhead with
-threaded. You should watch out for bound threads though: with
-threaded the main thread is a bound thread, and communication with the
main thread is much slower than between unbound threads. See
http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.1/Control-Concurrent.html#8
> On a slightly different topic: please could you point me to a place
> where stg_upd_frame_info is generated? I can't find it in *.c, *.cmm or
> *.hs and guess it is something very special.
rts/Updates.cmm:
INFO_TABLE_RET( stg_upd_frame, UPDATE_FRAME, UPD_FRAME_PARAMS)
{
...
}
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list