FFI calls: is it possible to allocate a small memory block on a stack?

Mon Apr 19 09:51:33 EDT 2010

On 18/04/2010 10:28, Denys Rtveliashvili wrote:
>
>> While alloca is not as cheap as, say, C's alloca, you should find that
>> it is much quicker than C's malloc.  I'm sure there's room for
>> optimisation if it's critical for you.  There may well be low-hanging
>> fruit: take a look at the Core for alloca.
> Thank you, Simon.
>
> Indeed, there is a low-hanging fruit.
>
> "alloca"'s type is "Storable a => (Ptr a -> IO b) -> IO b" and it is not
> inlined even though the function is small. And calls to functions of
> such signature are expensive (I suppose that's because of look-up into
> typeclass dictionary). However, when I added an "INLINE" pragma for the
> function into Foreign.Marshal.Alloc the time of execution dropped from
> 40 to 20 nanoseconds. I guess the same effect will take place if other
> similar functions get marked with "INLINE".
>
> Is there a reason why we do not want small FFI-related functions with
> typeclass arguments be marked with "INLINE" pragma and gain a
> performance improvement?
> The only reason that comes to my mind is the size of code, but actually
> the resulting code looks very small and neat.

Adding an INLINE pragma is the right thing for alloca and similar functions.

alloca is a small overloaded wrapper around allocaBytesAligned, and 
without the INLINE pragma the body of allocaBytesAligned gets inlined 
into alloca itself, making it too big to be inlined at the call site 
(you can work around it with e.g. -funfolding-use-threshold=100).  This 
is really a case of manual worker/wrapper: we want to tell GHC that 
alloca is a wrapper, and the way to do that is with INLINE.  Ideally GHC 
would manage this itself - there's a lot of scope for doing some general 
code splitting, I don't think anyone has explored that yet.

Cheers,
	Simon