Performance of small allocations via prim ops

Thu Apr 6 20:47:51 UTC 2023

That sounds like a worthy experiment!

I  guess that would look like having an inline macro’d up path that checks
if it can get the job done that falls back to the general code?

Last I checked, the overhead for this sort of c call was on the order of
10nanoseconds or less which seems like it’d be very unlikely to be a
bottleneck, but do you have any natural or artificial benchmark programs
that would show case this?

For this sortah code, extra branching for that optimization could easily
have a larger performance impact than the known function call on modern
hardware.  (Though take my intuitions about these things with a grain of
salt. )

On Tue, Apr 4, 2023 at 9:50 PM Harendra Kumar <harendra.kumar at gmail.com>
wrote:

> I was looking at the RTS code for allocating small objects via prim ops
> e.g. newByteArray# . The code looks like:
>
> stg_newByteArrayzh ( W_ n )
> {
>     MAYBE_GC_N(stg_newByteArrayzh, n);
>
>     payload_words = ROUNDUP_BYTES_TO_WDS(n);
>     words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
>     ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);
>
> We are making a foreign call here (ccall). I am wondering how much
> overhead a ccall adds? I guess it may have to save and restore registers.
> Would it be better to do the fast path case of allocating small objects
> from the nursery using cmm code like in stg_gc_noregs?
>
> -harendra
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230406/0f923855/attachment.html>