FFI calls: is it possible to allocate a small memory block on
a stack?
Simon Marlow
marlowsd at gmail.com
Mon Apr 19 09:51:33 EDT 2010
On 18/04/2010 10:28, Denys Rtveliashvili wrote:
>
>> While alloca is not as cheap as, say, C's alloca, you should find that
>> it is much quicker than C's malloc. I'm sure there's room for
>> optimisation if it's critical for you. There may well be low-hanging
>> fruit: take a look at the Core for alloca.
> Thank you, Simon.
>
> Indeed, there is a low-hanging fruit.
>
> "alloca"'s type is "Storable a => (Ptr a -> IO b) -> IO b" and it is not
> inlined even though the function is small. And calls to functions of
> such signature are expensive (I suppose that's because of look-up into
> typeclass dictionary). However, when I added an "INLINE" pragma for the
> function into Foreign.Marshal.Alloc the time of execution dropped from
> 40 to 20 nanoseconds. I guess the same effect will take place if other
> similar functions get marked with "INLINE".
>
> Is there a reason why we do not want small FFI-related functions with
> typeclass arguments be marked with "INLINE" pragma and gain a
> performance improvement?
> The only reason that comes to my mind is the size of code, but actually
> the resulting code looks very small and neat.
Adding an INLINE pragma is the right thing for alloca and similar functions.
alloca is a small overloaded wrapper around allocaBytesAligned, and
without the INLINE pragma the body of allocaBytesAligned gets inlined
into alloca itself, making it too big to be inlined at the call site
(you can work around it with e.g. -funfolding-use-threshold=100). This
is really a case of manual worker/wrapper: we want to tell GHC that
alloca is a wrapper, and the way to do that is with INLINE. Ideally GHC
would manage this itself - there's a lot of scope for doing some general
code splitting, I don't think anyone has explored that yet.
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list