FFI calls: is it possible to allocate a small memory block on
a stack?
Simon Marlow
marlowsd at gmail.com
Thu Apr 22 17:14:24 EDT 2010
On 22/04/10 21:25, Denys Rtveliashvili wrote:
> Thank you, Simon
>
> I have identified a number of problems and have created patches for a
> couple of them. A ticket #4004 was raised in trac and I hope that
> someone would take a look and put it into repository if the patches look
> good.
>
> Things I did:
> * Inlining for a few functions
Thanks - I already did this for alloca/malloc, I'll add the others from
your patch.
> * changed multiplication and division in include/Cmm.h to bit shifts
This really shouldn't be required, I'll look into why the optimisation
isn't working.
> Things that can be done:
> * optimizations in the threaded RTS. Locking is used frequently, and
> every locking on a normal mutex in "POSIX threads" costs about 20
> nanoseconds on my computer.
We go to quite a lot of trouble to avoid locking in the common cases and
fast paths - most of our data structures are CPU-local. Where in
particular have you encountered locking that could be reduced?
> * moving some computations from Cmm code to Haskell. This requires
> passing an information on word size and things like that to Haskell
> code, but the benefit is that some computations can be performed
> statically as they depend primarily on the data type we allocate space for.
> * fix/improvement for Cmm compiler. There is some code in it already
> which substitutes divisions and multiplications by 2^n by bit shifts,
> but for some reason it does not work. Also, divisions can be replaced by
> multiplications with bit shifts in general case.
>
> ---
>
> Also, while looking at this thing I've got a number of questions. One of
> them is this:
>
> What is the meaning of "pinned_object_block" in rts/sm/Storage.h and why
> is it shared between TSOs? It looks like "allocatePinned" has to lock on
> SM_MUTEX every time it is called (in threaded RTS) because other threads
> can be accessing it. More than that, this block of memory is assigned to
> a nursery of one of the TSOs. Why should it be shared with the rest of
> the world then instead of being local to TSO?
The pinned_object_block is CPU-local, usually no locking is required.
Only when the block is full do we have to get a new block from the block
allocator, and that requires a lock, but it's a rare case.
Cheers,
Simon
> On the side note, is London HUG still active? The website seems to be
> down...
>
>
> With kind regards,
> Denys Rtveliashvili
>
>> Adding an INLINE pragma is the right thing for alloca and similar functions.
>>
>> alloca is a small overloaded wrapper around allocaBytesAligned, and
>> without the INLINE pragma the body of allocaBytesAligned gets inlined
>> into alloca itself, making it too big to be inlined at the call site
>> (you can work around it with e.g. -funfolding-use-threshold=100). This
>> is really a case of manual worker/wrapper: we want to tell GHC that
>> alloca is a wrapper, and the way to do that is with INLINE. Ideally GHC
>> would manage this itself - there's a lot of scope for doing some general
>> code splitting, I don't think anyone has explored that yet.
>>
>> Cheers,
>> Simon
More information about the Glasgow-haskell-users
mailing list