[Haskell-cafe] IORef vs TVar performance: 6 seconds versus 4 minutes

Mon Dec 29 00:15:25 EST 2008

Both readTVar and writeTVar are worse than O(1); they have to look up
the TVar in the transaction log to see if you have made local changes
to it.

Right now it looks like that operation is O(n) where n is the number
of TVars accessed by a transaction, so your big transaction which is
just accessing a ton of TVars is likely O(n^2).

>From ghc/HEAD/rts/STM.c:

static TRecEntry *get_entry_for(StgTRecHeader *trec, StgTVar *tvar,
StgTRecHeader **in) {
  TRecEntry *result = NULL;

  TRACE("%p : get_entry_for TVar %p", trec, tvar);
  ASSERT(trec != NO_TREC);

  do {
    FOR_EACH_ENTRY(trec, e, {
      if (e -> tvar == tvar) {
        result = e;
        if (in != NULL) {
          *in = trec;
        }
        BREAK_FOR_EACH;
      }
    });
    trec = trec -> enclosing_trec;
  } while (result == NULL && trec != NO_TREC);

  return result;
}

STM performance is not really geared towards "big" transactions right
now; in large part because big transactions are likely to starve under
any real workload anyways.  If you have a single-threaded startup bit
to populate your data followed by concurrent small mutations, you can
put the startup in IO using small transactions to populate the data.

  -- ryan

On Sun, Dec 28, 2008 at 8:02 PM, Jim Snow <jsnow at cs.pdx.edu> wrote:
>
> I decided to try to implement a graph algorithm using STM.  Each node in the
> graph has a set of TVar-protected lists of the nodes it links to and the
> nodes that link to it.  Also, there is a global TVar-protected Data.Map that
> contains all the nodes in the graph, indexed by name (which is polymorphic):
>
> data Node k r = Node {
> fwdPos :: TVar [Node k r],  -- forward links (nodes we like)
> fwdNeg :: TVar [Node k r],  -- we allow "negative" links, too (nodes we
> don't like)
> revPos :: TVar [Node k r],  -- backlinks (nodes that like us)
> revNeg :: TVar [Node k r], -- negative back links (nodes that don't like us)
> currRep :: r,    -- extra user-defined data
> name   :: k     -- node's unique identifier
> } deriving Show
>
> data Network k r = Network {
> node    :: TVar (M.Map k (Node k r)),  -- map of nodes by name
> trusted :: TVar [Node k r]        -- a list of nodes we need to iterate over
> occasionally
> } deriving Show
>
>
> I tried loading a datafile of about 20,000 nodes into the graph in one big
> transaction, and found that it takes about 4 minutes.  This seemed rather
> slow, so I replaced all the TVars with IORefs (and substituted STM with IO
> in the type signatures), and the same operation with the new version took
> about 6 seconds!
>
> This is all with one thread, so there should be no contention for the TVars.
>  Is there something about STM that makes it scale worse than linearly wrt
> the number of mutations in a transaction?
>
> Above performance numbers are for ghc-6.10.1.  With ghc-6.8.3, the STM
> version takes more than 9 minutes.
>
> According to profiling, one of my trouble spots is this function, which just
> adds an entry onto a TVar [a]:
>
> stmcons :: k -> TVar [k] -> STM ()
> stmcons x tv =
> do xs <- readTVar tv
>     writeTVar tv (x:xs)
>
> This seems like it ought to be pretty innocuous, unless the whole list is
> getting evaluated each time I cons a new entry, or if readTVar or writeTVar
> are much more expensive than they appear.
>
> -jim
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>