FFI calls: is it possible to allocate a small memory block on a stack?

Thu Apr 22 23:39:21 EDT 2010

Hi Simon,

> Thanks - I already did this for alloca/malloc, I'll add the others from 
> your patch.

Thank you.

> We go to quite a lot of trouble to avoid locking in the common cases and 
> fast paths - most of our data structures are CPU-local.  Where in 
> particular have you encountered locking that could be reduced?

> The pinned_object_block is CPU-local, usually no locking is required. 
> Only when the block is full do we have to get a new block from the block 
> allocator, and that requires a lock, but it's a rare case.

OK, the code I have checked out from the repository contains this in
"rts/sm/Storage.h":

        extern bdescr * pinned_object_block;

And in "rts/sm/Storage.c":

        bdescr *pinned_object_block;

My C might be rusty, but I see no way for pinned_object_block to be CPU
local. If it is truly CPU local then what makes it to be that kind?

As for locking, here is one one of examples:

        StgPtr
        allocatePinned( lnat n )
        {
            StgPtr p;
            bdescr *bd = pinned_object_block;

            // If the request is for a large object, then allocate()
            // will give us a pinned object anyway.
            if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) {
          p = allocate(n);
                Bdescr(p)->flags |= BF_PINNED;
                return p;
            }

            ACQUIRE_SM_LOCK; // [RTVD: here we acquire the lock]

            TICK_ALLOC_HEAP_NOCTR(n);
            CCS_ALLOC(CCCS,n);

            // If we don't have a block of pinned objects yet, or the
        current
            // one isn't large enough to hold the new object, allocate a
        new one.
            if (bd == NULL || (bd->free + n) > (bd->start +
        BLOCK_SIZE_W)) {
          pinned_object_block = bd = allocBlock();
          dbl_link_onto(bd, &g0s0->large_objects);
          g0s0->n_large_blocks++;
          bd->gen_no = 0;
          bd->step   = g0s0;
          bd->flags  = BF_PINNED | BF_LARGE;
          bd->free   = bd->start;
          alloc_blocks++;
            }

            p = bd->free;
            bd->free += n;
            RELEASE_SM_LOCK; // [RTVD: here we release the lock]
            return p;
        }

        Of course, TICK_ALLOC_HEAP_NOCTR and CCS_ALLOC may require
        synchronization if they use shared state (which is, again,
        probably unnecessary). However, in case no profiling goes on and
        "pinned_object_block" is TSO-local, isn't it possible to remove
        locking completely from this code? The only case when locking
        will be necessary is when a fresh block has to be allocated, and
        that can be done within the "allocBlock" method (or, more
        precisely, by using "allocBlock_lock".

        ACQUIRE_SM_LOCK/RELEASE_SM_LOCK pair is present in other places
        too, but I have not analysed yet if it is really necessary
        there. For example, things like newCAF and newDynCAF are wrapped
        into it.

        With kind regards,
        Denys Rtveliashvili
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20100422/c69fbc50/attachment.html