[commit: ghc] ghc-8.6: Fix a race between GC threads in concurrent scavenging (d46dd45)

git at git.haskell.org git at git.haskell.org
Fri Sep 7 13:36:12 UTC 2018


Repository : ssh://git@git.haskell.org/ghc

On branch  : ghc-8.6
Link       : http://ghc.haskell.org/trac/ghc/changeset/d46dd4528753f5b1e13540691f936cf45c127621/ghc

>---------------------------------------------------------------

commit d46dd4528753f5b1e13540691f936cf45c127621
Author: Ömer Sinan Ağacan <omeragacan at gmail.com>
Date:   Thu Sep 6 15:52:53 2018 +0300

    Fix a race between GC threads in concurrent scavenging
    
    While debugging #15285 I realized that free block lists (free_list in
    BlockAlloc.c) get corrupted when multiple scavenge threads allocate and
    release blocks concurrently. Here's a picture of one such race:
    
        Thread 2 (Thread 32573.32601):
        #0  check_tail
            (bd=0x940d40 <stg_TSO_info>) at rts/sm/BlockAlloc.c:860
        #1  0x0000000000928ef7 in checkFreeListSanity
            () at rts/sm/BlockAlloc.c:896
        #2  0x0000000000928979 in freeGroup
            (p=0x7e998ce02880) at rts/sm/BlockAlloc.c:721
        #3  0x0000000000928a17 in freeChain
            (bd=0x7e998ce02880) at rts/sm/BlockAlloc.c:738
        #4  0x0000000000926911 in freeChain_sync
            (bd=0x7e998ce02880) at rts/sm/GCUtils.c:80
        #5  0x0000000000934720 in scavenge_capability_mut_lists
            (cap=0x1acae80) at rts/sm/Scav.c:1665
        #6  0x000000000092b411 in gcWorkerThread
            (cap=0x1acae80) at rts/sm/GC.c:1157
        #7  0x000000000090be9a in yieldCapability
            (pCap=0x7f9994e69e20, task=0x7e9984000b70, gcAllowed=true) at rts/Capability.c:861
        #8  0x0000000000906120 in scheduleYield
            (pcap=0x7f9994e69e50, task=0x7e9984000b70) at rts/Schedule.c:673
        #9  0x0000000000905500 in schedule
            (initialCapability=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:293
        #10 0x0000000000908d4f in scheduleWorker
            (cap=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:2554
        #11 0x000000000091a30a in workerStart
            (task=0x7e9984000b70) at rts/Task.c:444
        #12 0x00007f99937fa6db in start_thread
            (arg=0x7f9994e6a700) at pthread_create.c:463
        #13 0x000061654d59f88f in clone
            () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
    
        Thread 1 (Thread 32573.32573):
        #0  checkFreeListSanity
            () at rts/sm/BlockAlloc.c:887
        #1  0x0000000000928979 in freeGroup
            (p=0x7e998d303540) at rts/sm/BlockAlloc.c:721
        #2  0x0000000000926f23 in todo_block_full
            (size=513, ws=0x1aa8ce0) at rts/sm/GCUtils.c:264
        #3  0x00000000009583b9 in alloc_for_copy
            (size=513, gen_no=0) at rts/sm/Evac.c:80
        #4  0x000000000095850d in copy_tag_nolock
            (p=0x7e998c675f28, info=0x421d98 <Main_Large_con_info>, src=0x7e998d075d80, size=513,
            gen_no=0, tag=1) at rts/sm/Evac.c:153
        #5  0x0000000000959177 in evacuate
            (p=0x7e998c675f28) at rts/sm/Evac.c:715
        #6  0x0000000000932388 in scavenge_small_bitmap
            (p=0x7e998c675f28, size=1, bitmap=0) at rts/sm/Scav.c:271
        #7  0x0000000000934aaf in scavenge_stack
            (p=0x7e998c675f28, stack_end=0x7e998c676000) at rts/sm/Scav.c:1908
        #8  0x0000000000934295 in scavenge_one
            (p=0x7e998c66e000) at rts/sm/Scav.c:1466
        #9  0x0000000000934662 in scavenge_mutable_list
            (bd=0x7e998d300440, gen=0x1b1d880) at rts/sm/Scav.c:1643
        #10 0x0000000000934700 in scavenge_capability_mut_lists
            (cap=0x1aaa340) at rts/sm/Scav.c:1664
        #11 0x00000000009299b6 in GarbageCollect
            (collect_gen=0, do_heap_census=false, gc_type=2, cap=0x1aaa340, idle_cap=0x1b38aa0)
            at rts/sm/GC.c:378
        #12 0x0000000000907a4a in scheduleDoGC
            (pcap=0x7ffdec5b5310, task=0x1b36650, force_major=false) at rts/Schedule.c:1798
        #13 0x0000000000905de7 in schedule
            (initialCapability=0x1aaa340, task=0x1b36650) at rts/Schedule.c:546
        #14 0x0000000000908bc4 in scheduleWaitThread
            (tso=0x7e998c0067c8, ret=0x0, pcap=0x7ffdec5b5430) at rts/Schedule.c:2537
        #15 0x000000000091b5a0 in rts_evalLazyIO
            (cap=0x7ffdec5b5430, p=0x9c11f0, ret=0x0) at rts/RtsAPI.c:530
        #16 0x000000000091ca56 in hs_main
            (argc=1, argv=0x7ffdec5b5628, main_closure=0x9c11f0, rts_config=...) at rts/RtsMain.c:72
        #17 0x0000000000421ea0 in main
            ()
    
    In particular, dbl_link_onto() which is used to add a freed block to a
    doubly-linked free list is not thread safe and corrupts the list when
    called concurrently.
    
    Note that thread 1 is to blame here as thread 2 is properly taking the
    spinlock. With this patch we now take the spinlock when freeing a todo
    block in GC, avoiding this race.
    
    Test Plan:
    - Tried slow validate locally: this patch does not introduce new failures.
    - circleci: https://circleci.com/gh/ghc/ghc-diffs/283 The test got killed
      because it took 5 hours but T7919 (which was previously failing on circleci)
      passed.
    
    Reviewers: simonmar, bgamari, erikd
    
    Reviewed By: simonmar
    
    Subscribers: rwbarton, carter
    
    GHC Trac Issues: #15285
    
    Differential Revision: https://phabricator.haskell.org/D5115
    
    (cherry picked from commit c6fbac6a6a69a2f4be89701b2c386ae53214f9a3)


>---------------------------------------------------------------

d46dd4528753f5b1e13540691f936cf45c127621
 rts/sm/GCUtils.c | 10 +++++++++-
 rts/sm/GCUtils.h |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/rts/sm/GCUtils.c b/rts/sm/GCUtils.c
index 24f7b5e..31b2913 100644
--- a/rts/sm/GCUtils.c
+++ b/rts/sm/GCUtils.c
@@ -81,6 +81,14 @@ freeChain_sync(bdescr *bd)
     RELEASE_SPIN_LOCK(&gc_alloc_block_sync);
 }
 
+void
+freeGroup_sync(bdescr *bd)
+{
+    ACQUIRE_SPIN_LOCK(&gc_alloc_block_sync);
+    freeGroup(bd);
+    RELEASE_SPIN_LOCK(&gc_alloc_block_sync);
+}
+
 /* -----------------------------------------------------------------------------
    Workspace utilities
    -------------------------------------------------------------------------- */
@@ -261,7 +269,7 @@ todo_block_full (uint32_t size, gen_workspace *ws)
                 // object.  However, if the object we're copying is
                 // larger than a block, then we might have an empty
                 // block here.
-                freeGroup(bd);
+                freeGroup_sync(bd);
             } else {
                 push_scanned_block(bd, ws);
             }
diff --git a/rts/sm/GCUtils.h b/rts/sm/GCUtils.h
index 3c17449..8b60407 100644
--- a/rts/sm/GCUtils.h
+++ b/rts/sm/GCUtils.h
@@ -31,6 +31,7 @@ INLINE_HEADER bdescr *allocBlockOnNode_sync(uint32_t node)
 }
 
 void    freeChain_sync(bdescr *bd);
+void    freeGroup_sync(bdescr *bd);
 
 void    push_scanned_block   (bdescr *bd, gen_workspace *ws);
 StgPtr  todo_block_full      (uint32_t size, gen_workspace *ws);



More information about the ghc-commits mailing list