[GHC] #12607: Memory effects in doomed STM transactions

GHC ghc-devs at haskell.org
Wed Sep 21 13:39:28 UTC 2016


#12607: Memory effects in doomed STM transactions
-------------------------------------+-------------------------------------
           Reporter:  fryguybob      |             Owner:
               Type:  bug            |            Status:  new
           Priority:  normal         |         Milestone:
          Component:  Runtime        |           Version:  8.0.1
  System                             |
           Keywords:  STM, allocate  |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 == Problem ==

 GHC's STM implementation does "lazy" validation meaning that speculative
 execution can continue even after observing an inconsistent view of
 memory.  Transactions in the state are called "doomed transactions".
 While the runtime tries to avoids some effects in doomed transactions,
 such as entering an infinite loop, it does not successfully control all
 memory effects.  In particular there are many (I do not have an exhaustive
 list) allocations that eventually end up in the `allocate` function which
 is nicely documented with the following:

 > `allocate()` never
 > fails; if it returns, the returned value is a valid address.  If
 > the nursery is already full, then another block is allocated from
 > the global block pool.  If we need to get memory from the OS and
 > that operation fails, then the whole process will be killed.

 https://github.com/ghc/ghc/blob/master/rts/sm/Storage.c#L819

 A doomed transaction that demands a large allocation that the OS cannot
 fulfill will kill the process.  Here is a program that reliably fails for
 me:

 {{{#!hs
 -- oom.hs
 {-# LANGUAGE BangPatterns #-}
 import GHC.Conc
 import Control.Concurrent

 import qualified Data.ByteString as B

 forever act = act >> forever act

 check True = return ()
 check False = retry

 main = do
     tx <- newTVarIO 0
     ty <- newTVarIO 1
     tz <- newTVarIO 0

     done <- newTVarIO False

     _ <- forkIO $ forever $ atomically $ do
         -- Only read tx and ty
         x <- readTVar tx
         y <- readTVar ty

         if x > y  -- This should always be False
           then do
             -- We only get here in a doomed transaction.  Commenting out
 the next two lines
             -- but leaving the write to done and the program runs as
 expected because the
             -- doomed transaction is detected at commit time.
             let !big = B.length $ B.replicate maxBound 0  -- The big
 allocation!
             writeTVar tz big
             writeTVar done True
           else return ()

     let mut = forever $ atomically $ do
         y <- readTVar ty
         x <- readTVar tx

         if x > 1000
           then do
             -- When we get big enough, start over.
             writeTVar tx 0
             writeTVar ty 1  -- Always holds semantically that tx < ty
           else do
             -- increment x and y both
             writeTVar ty (succ y)
             writeTVar tx (succ x)  -- tx < ty

     -- Give lots of opportunities to witness inconsistent memory.
     _ <- forkIO mut
     _ <- forkIO mut
     _ <- forkIO mut

     atomically $ readTVar done >>= check

     putStrLn "Done"
 }}}

 Running leads to out of memory:

 {{{#!bash
 > ghc oom.hs -fno-omit-yields -threaded
 [1 of 1] Compiling Main             ( oom.hs, oom.o )
 Linking oom.exe ...
 > ./oom.exe +RTS -N
 oom.exe: out of memory
 }}}

 == Fixing ==

 When a potentially bad memory effect is about to happen in some thread, we
 need to ensure that we validate any running transaction and if it fails
 have some way to back out of the operation and abort the transaction.
 This might be tricky on two fronts 1) finding all the critical allocations
 and 2) finding places where we can both detect (1) and abort the
 transaction.  The places I'm concerned about most are array allocation
 such as the `ByteString` in the example and maybe `Integer` allocation.
 Another fix is a different STM implementation altogether that (at a
 performance cost or trade-off) doesn't allow doomed transactions (to
 appear Haskell Symposium 2016 :D).  I think we can at least address this
 particular example without going so far.

 A related issue is more general memory leaks from doomed transactions.
 Consider a program with a memoized Fibonacci function.  The semantics of
 the transactions written by the programmer may ensure that no value higher
 then 10 is ever demanded of that function, yet in a doomed transaction the
 invariant goes wrong and the program demands 100.  The program will detect
 the doomed transaction eventually in this case, but not before it has
 allocated a large live blob never to be touched again.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12607>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list