Performance of small allocations via prim ops

Fri Apr 7 11:34:51 UTC 2023

On Fri, 7 Apr 2023 at 02:18, Carter Schonwald <carter.schonwald at gmail.com>
wrote:

> That sounds like a worthy experiment!
>
> I  guess that would look like having an inline macro’d up path that checks
> if it can get the job done that falls back to the general code?
>
> Last I checked, the overhead for this sort of c call was on the order of
> 10nanoseconds or less which seems like it’d be very unlikely to be a
> bottleneck, but do you have any natural or artificial benchmark programs
> that would show case this?
>

I converted my example code into a loop and ran it a million times with a 1
byte array size (would be 8 bytes after alignment). So roughly 3 words
would be allocated per array, including the header and length. It took 5 ms
using the statically known size optimization which inlines the alloc
completely, and 10 ms using an unknown size (from program arg) which makes
a call to newByteArray# . That turns out to be of the order of 5ns more per
allocation. It does not sound like a big deal.

-harendra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230407/db80c3e0/attachment.html>