FFI calls: is it possible to allocate a small memory block on a stack?

Thu Apr 22 17:14:24 EDT 2010

On 22/04/10 21:25, Denys Rtveliashvili wrote:
> Thank you, Simon
>
> I have identified a number of problems and have created patches for a
> couple of them. A ticket #4004 was raised in trac and I hope that
> someone would take a look and put it into repository if the patches look
> good.
>
> Things I did:
> * Inlining for a few functions

Thanks - I already did this for alloca/malloc, I'll add the others from 
your patch.

> * changed multiplication and division in include/Cmm.h to bit shifts

This really shouldn't be required, I'll look into why the optimisation 
isn't working.

> Things that can be done:
> * optimizations in the threaded RTS. Locking is used frequently, and
> every locking on a normal mutex in "POSIX threads" costs about 20
> nanoseconds on my computer.

We go to quite a lot of trouble to avoid locking in the common cases and 
fast paths - most of our data structures are CPU-local.  Where in 
particular have you encountered locking that could be reduced?

> * moving some computations from Cmm code to Haskell. This requires
> passing an information on word size and things like that to Haskell
> code, but the benefit is that some computations can be performed
> statically as they depend primarily on the data type we allocate space for.
> * fix/improvement for Cmm compiler. There is some code in it already
> which substitutes divisions and multiplications by 2^n by bit shifts,
> but for some reason it does not work. Also, divisions can be replaced by
> multiplications with bit shifts in general case.
>
> ---
>
> Also, while looking at this thing I've got a number of questions. One of
> them is this:
>
> What is the meaning of "pinned_object_block" in rts/sm/Storage.h and why
> is it shared between TSOs? It looks like "allocatePinned" has to lock on
> SM_MUTEX every time it is called (in threaded RTS) because other threads
> can be accessing it. More than that, this block of memory is assigned to
> a nursery of one of the TSOs. Why should it be shared with the rest of
> the world then instead of being local to TSO?

The pinned_object_block is CPU-local, usually no locking is required. 
Only when the block is full do we have to get a new block from the block 
allocator, and that requires a lock, but it's a rare case.

Cheers,
	Simon

> On the side note, is London HUG still active? The website seems to be
> down...
>
>
> With kind regards,
> Denys Rtveliashvili
>
>> Adding an INLINE pragma is the right thing for alloca and similar functions.
>>
>> alloca is a small overloaded wrapper around allocaBytesAligned, and
>> without the INLINE pragma the body of allocaBytesAligned gets inlined
>> into alloca itself, making it too big to be inlined at the call site
>> (you can work around it with e.g. -funfolding-use-threshold=100).  This
>> is really a case of manual worker/wrapper: we want to tell GHC that
>> alloca is a wrapper, and the way to do that is with INLINE.  Ideally GHC
>> would manage this itself - there's a lot of scope for doing some general
>> code splitting, I don't think anyone has explored that yet.
>>
>> Cheers,
>> 	Simon