FW: optimizing StgPtr allocate (Capability *cap, W_ n)

Tue Oct 14 20:16:30 UTC 2014

Simon, did you see this?

Simon

-----Original Message-----
From: Glasgow-haskell-users [mailto:glasgow-haskell-users-bounces at haskell.org] On Behalf Of Bulat Ziganshin
Sent: 14 October 2014 18:09
To: glasgow-haskell-users at haskell.org
Subject: optimizing StgPtr allocate (Capability *cap, W_ n)

Hello Glasgow-haskell-users,

i'm looking a the https://github.com/ghc/ghc/blob/23bb90460d7c963ee617d250fa0a33c6ac7bbc53/rts/sm/Storage.c#L680

if i correctly understand, it's speed-critical routine?

i think that it may be improved in this way:

StgPtr allocate (Capability *cap, W_ n)
{
    bdescr *bd;
    StgPtr p;

    TICK_ALLOC_HEAP_NOCTR(WDS(n));
    CCS_ALLOC(cap->r.rCCCS,n);

/// here starts new improved code:

    bd = cap->r.rCurrentAlloc;
    if (bd == NULL || bd->free + n > bd->end) {
        if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) {
            ....
        }
        if (bd->free + n <= bd->start + BLOCK_SIZE_W)
            bd->end = min (bd->start + BLOCK_SIZE_W, bd->free + LARGE_OBJECT_THRESHOLD)
            goto usual_alloc;
        }
        ....
    }

/// and here it stops

usual_alloc:
    p = bd->free;
    bd->free += n;

    IF_DEBUG(sanity, ASSERT(*((StgWord8*)p) == 0xaa));
    return p;
}

i  think  it's  obvious - we consolidate two if's on the crirical path
into the single one plus avoid one ADD by keeping highly-useful bd->end pointer

further   improvements   may   include   removing  bd==NULL  check  by
initializing bd->free=bd->end=NULL   and   moving   entire   "if" body
into   separate   slow_allocate()  procedure  marked  "noinline"  with
allocate() probably marked as forceinline:

StgPtr allocate (Capability *cap, W_ n)
{
    bdescr *bd;
    StgPtr p;

    TICK_ALLOC_HEAP_NOCTR(WDS(n));
    CCS_ALLOC(cap->r.rCCCS,n);

    bd = cap->r.rCurrentAlloc;
    if (bd->free + n > bd->end)
        return slow_allocate(cap,n);

    p = bd->free;
    bd->free += n;

    IF_DEBUG(sanity, ASSERT(*((StgWord8*)p) == 0xaa));
    return p;
}

this  change  will  greatly simplify optimizer's work. according to my
experience   current  C++  compilers  are  weak  on  optimizing  large
functions with complex execution paths and such transformations really
improve the generated code

-- 
Best regards,
 Bulat                          mailto:Bulat.Ziganshin at gmail.com

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users at haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users