<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Update: nonmoving GC does make differences<div class=""><br class=""></div><div class="">I think couldn't observe it because I set the heap -H2g rather large, and generation 0 are still collected by old moving GC which having difficulty in handling the large hazard heap. After I realize just now that nonmoving GC only works against oldest generation, I tested it again with `+RTS -H16m -A4m` with and without `-xn`, then:</div><div class=""><br class=""></div><div class="">Without -xn (old moving GC in effect), the throughput degrades fast and stop business progressing at ~200MB of server RSS</div><div class=""><br class=""></div><div class="">With -xn (new nonmvoing GC in effect), server RSS can burst to ~350MB, then throughput degrades relative slower, until RSS reached ~1GB, after then barely progressing at business yielding. But RSS can keep growing with occasional burst fashioned business yield, until ~3.3GB then it totally stuck.</div><div class=""><br class=""></div><div class="">Regards,</div><div class="">Compl</div><div class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On 2020-07-30, at 13:31, Compl Yue via Haskell-Cafe <<a href="mailto:haskell-cafe@haskell.org" class="">haskell-cafe@haskell.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
  
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
  
  <div class=""><p class="">Thanks Ryan, and I'm honored to get Simon's attention.</p><p class="">I did have some worry about package tskiplist, that its github
      repository seems withdrawn, I emailed the maintainer Peter
      Robinson lately but have gotten no response by far. What
      particularly worrying me is the 1st sentence of the Readme has
      changed from 1.0.0 to 1.0.1 (which is current) as:
    </p><p class="">> - <span style="color: rgb(200, 195, 188); font-family:
        "PT Sans", -apple-system, BlinkMacSystemFont,
        "Segoe UI", Roboto, Oxygen-Sans, Cantarell,
        "Helvetica Neue", sans-serif; font-size: 17px;
        font-style: normal; font-variant-ligatures: normal;
        font-variant-caps: normal; font-weight: 400; letter-spacing:
        0.024px; orphans: 2; text-align: left; text-indent: 0px;
        text-transform: none; white-space: normal; widows: 2;
        word-spacing: 0px; -webkit-text-stroke-width: 0px;
        background-color: rgb(25, 27, 28); text-decoration-style:
        initial; text-decoration-color: initial; display: inline
        !important; float: none;" class="">This package provides an
        implementation of a skip list in STM.</span></p><p class="">>+ <span style="color: rgb(200, 195, 188); font-family:
        "PT Sans", -apple-system, BlinkMacSystemFont,
        "Segoe UI", Roboto, Oxygen-Sans, Cantarell,
        "Helvetica Neue", sans-serif; font-size: 17px;
        font-style: normal; font-variant-ligatures: normal;
        font-variant-caps: normal; font-weight: 400; letter-spacing:
        0.024px; orphans: 2; text-align: left; text-indent: 0px;
        text-transform: none; white-space: normal; widows: 2;
        word-spacing: 0px; -webkit-text-stroke-width: 0px;
        background-color: rgb(25, 27, 28); text-decoration-style:
        initial; text-decoration-color: initial; display: inline
        !important; float: none;" class="">This package provides a
        proof-of-concept implementation of a skip list in STM</span></p><p class="">This has to mean something but I can't figure out yet.<br class="">
    </p><p class="">Dear Peter Robinson, I hope you can see this message and get in
      the loop of discussion. <br class="">
    </p><p class="">Despite that, I don't think overhead of TVar itself the most
      serious issue in my situation, as before GC engagement, there are
      as many TVars being allocated and updated without stuck at
      business progressing. And now I realize what presuring GC in my
      situation is not only the large number of pointers (TVars), and at
      the same time, they form many circular structures, that might be
      nightmare for a GC. As I model my data after graph model, in my
      test workload, there are many FeatureSet instances each being an
      entity/node object, then there are many Feature instances per
      FeatureSet object, each Feature instance being an unary
      relationship/edge object, with a reference attribute (via TVar)
      pointing to the FeatureSet object it belongs to, circular
      structures form because I maintain an index at each FeatureSet
      object, sorted by weight etc., but ultimately pointing back (via
      TVar) to all Feature objects belonging to the set.<br class="">
    </p><p class="">I'm still curious why the new non-moving GC in 8.10.1 still don't
      get obvious business progressing in my situation. I tested it on
      my Mac yesterday and there I don't know how to see how CPU time is
      distributed over threads within a process, I'll further test it
      with some Linux boxes to try understand it better.<br class="">
    </p><p class="">Best regards,</p><p class="">Compl</p><p class=""><br class="">
    </p>
    <div class="moz-cite-prefix">On 2020/7/30 上午10:05, Ryan Yates wrote:<br class="">
    </div>
    <blockquote type="cite" cite="mid:CAO27hRp3vUoq=9EDGLa+Pw9_CQUsdsh3LoarFmq6Xe1CgTJXjQ@mail.gmail.com" class="">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8" class="">
      <div dir="ltr" class="">Simon, I certainly want to help get to the bottom
        of the performance issue at hand :D.  Sorry if my reply was
        misleading.  The constant factor overhead of pushing `TVar`s
        into the internal structure may be pressuring unacceptable GC
        behavior to happen sooner.  My impression was that given the
        same size problem performance loss shifted from synchronization
        to GC.
        <div class=""><br class="">
        </div>
        <div class="">Compl, I'm not aware of mutable heap objects being
          problematic in particular for GHC's GC.  There are lots of
          special cases to handle them of course.  I have successfully
          written Haskell programs that get good performance from the GC
          with the dominant fraction of heap objects being mutable.  I
          looked a little more at `TSkipList` and one tricky aspect of
          an STM based skip list is how to manage randomness.  In
          `TSkipList`'s code there is the following comment:</div>
        <div class="">
          <div class=""><br class="">
          </div>
        </div>
        <div class="">
          <pre style="" class=""><span class="gmail-hs-comment" style="color:rgb(138,138,138)">-- | Returns a randomly chosen level. Used for inserting new elements. /O(1)./</span>
<a name="line-98" moz-do-not-send="true" class=""></a><span class="gmail-hs-comment" style="color:rgb(138,138,138)">-- For performance reasons, this function uses 'unsafePerformIO' to access the</span>
<a name="line-99" moz-do-not-send="true" class=""></a><span class="gmail-hs-comment" style="color:rgb(138,138,138)">-- random number generator. (It would be possible to store the random number</span>
<a name="line-100" moz-do-not-send="true" class=""></a><span class="gmail-hs-comment" style="color:rgb(138,138,138)">-- generator in a 'TVar' and thus be able to access it safely from within the</span>
<a name="line-101" moz-do-not-send="true" class=""></a><span class="gmail-hs-comment" style="color:rgb(138,138,138)">-- STM monad. This, however, might cause high contention among threads.)</span></pre>
          <pre style="" class=""><span class="gmail-hs-comment" style="color:rgb(138,138,138)"><pre style="" class=""><span class="gmail-hs-identifier" style="color:rgb(7,54,66)">chooseLevel</span> <span class="gmail-hs-glyph" style="color:rgb(220,50,47)">::</span> <a href="http://hackage.haskell.org/package/tskiplist-1.0.1/docs/src/Control.Concurrent.STM.TSkipList.Internal.html#TSkipList" class="gmail-" style="text-decoration-line:none;border-bottom:1px solid rgb(238,232,213)" moz-do-not-send="true"><span class="gmail-hs-type gmail-hs-identifier" style="color:rgb(95,95,175)">TSkipList</span></a> <a href="http://hackage.haskell.org/package/tskiplist-1.0.1/docs/src/Control.Concurrent.STM.TSkipList.Internal.html#local-6989586621679028835" class="gmail-" style="text-decoration-line:none;border-bottom:1px solid rgb(238,232,213)" moz-do-not-send="true"><span class="gmail-hs-type gmail-hs-identifier" style="color:rgb(95,95,175)">k</span></a> <a href="http://hackage.haskell.org/package/tskiplist-1.0.1/docs/src/Control.Concurrent.STM.TSkipList.Internal.html#local-6989586621679028836" style="text-decoration-line:none;border-bottom:1px solid rgb(238,232,213)" moz-do-not-send="true" class=""><span class="gmail-hs-type gmail-hs-identifier" style="color:rgb(95,95,175)">a</span></a> <span class="gmail-hs-glyph" style="color:rgb(220,50,47)">-></span> <span class="gmail-hs-type gmail-hs-identifier" style="color:rgb(95,95,175)">Int</span></pre></span></pre>
        </div>
        <div class=""><br class="">
        </div>
        <div class="">This level is chosen on insertion to determine the height
          of the node.  When writing my own STM skiplist I found that
          the details in unsafely accessing randomness had a significant
          impact on performance.  We went with an unboxed array of PCG
          states that had an entry for each capability giving constant
          memory overhead in the number of capabilities.  `TSkipList`
          uses `newStdGen` which involves allocation and
          synchronization.</div>
        <div class=""><br class="">
        </div>
        <div class="">Again, I'm not pointing this out to say that this is the
          entirety of the issue you are encountering, rather, I do think
          the `TSkipList` library could be improved to allocate much
          less.  Others can speak to how to tell where the time is going
          in GC (my knowledge of this is likely out of date).</div>
        <div class=""><br class="">
        </div>
        <div class="">Ryan</div>
        <div class=""><br class="">
        </div>
      </div>
      <br class="">
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Wed, Jul 29, 2020 at 4:57
          PM Simon Peyton Jones <<a href="mailto:simonpj@microsoft.com" moz-do-not-send="true" class="">simonpj@microsoft.com</a>>
          wrote:<br class="">
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div lang="EN-GB" class="">
            <div class="gmail-m_-3608968087787866939WordSection1"><p class="MsoNormal"><span class="">Compl’s problem is (apparently)
                  that execution becomes dominated by GC.  That doesn’t
                  sound like a constant-factor overhead from TVars, no
                  matter how efficient (or otherwise) they are.  It
                  sounds more like a space leak to me; perhaps you need
                  some strict evaluation or something.</span></p><div class=""><span class=""> </span><br class="webkit-block-placeholder"></div><p class="MsoNormal"><span class="">My point is only: before
                  re-engineering STM it would make sense to get a much
                  more detailed insight into what is actually happening,
                  and where the space and time is going.  We have tools
                  to do this (heap profiling, Threadscope, …) but I know
                  they need some skill and insight to use well.  But we
                  don’t have nearly enough insight to draw meaningful
                  conclusions yet.</span></p><div class=""><span class=""> </span><br class="webkit-block-placeholder"></div><p class="MsoNormal"><span class="">Maybe someone with experience
                  of performance debugging might feel able to help
                  Compl?</span></p><div class=""><span class=""> </span><br class="webkit-block-placeholder"></div><p class="MsoNormal"><span class="">Simon</span></p><div class=""><span class=""> </span><br class="webkit-block-placeholder"></div>
              <div style="border-top:none;border-right:none;border-bottom:none;border-left:1.5pt
                solid blue;padding:0cm 0cm 0cm 4pt" class="">
                <div class="">
                  <div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt
                    solid rgb(225,225,225);padding:3pt 0cm 0cm" class=""><p class="MsoNormal"><b class=""><span lang="EN-US" class="">From:</span></b><span lang="EN-US" class=""> Haskell-Cafe <<a href="mailto:haskell-cafe-bounces@haskell.org" target="_blank" moz-do-not-send="true" class="">haskell-cafe-bounces@haskell.org</a>>
                        <b class="">On Behalf Of </b>Ryan Yates<br class="">
                        <b class="">Sent:</b> 29 July 2020 20:41<br class="">
                        <b class="">To:</b> YueCompl <<a href="mailto:compl.yue@icloud.com" target="_blank" moz-do-not-send="true" class="">compl.yue@icloud.com</a>><br class="">
                        <b class="">Cc:</b> Haskell Cafe <<a href="mailto:haskell-cafe@haskell.org" target="_blank" moz-do-not-send="true" class="">haskell-cafe@haskell.org</a>><br class="">
                        <b class="">Subject:</b> Re: [Haskell-cafe] STM friendly
                        TreeMap (or similar with range scan api) ? WAS:
                        Best ways to achieve throughput, for large M:N
                        ratio of STM threads, with hot TVar updates?</span></p>
                  </div>
                </div><div class=""> <br class="webkit-block-placeholder"></div>
                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                    Hi Compl,</p>
                  <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                       <br class="webkit-block-placeholder"></div>
                  </div>
                  <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                      There is a lot of overhead with TVars.  My thesis
                      work addresses this by incorporating mutable
                      constructor fields with STM.  I would like to get
                      all that into GHC as soon as I can :D.  I haven't
                      looked closely at the `tskiplist` package, I'll
                      take a look and see if I see any potential
                      issues.  There was some recent work on concurrent
                      B-tree that may be interesting to try.</p>
                  </div>
                  <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                       <br class="webkit-block-placeholder"></div>
                  </div>
                  <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                      Ryan</p>
                  </div>
                </div><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                   <br class="webkit-block-placeholder"></div>
                <div class="">
                  <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                      On Wed, Jul 29, 2020 at 10:24 AM YueCompl <<a href="mailto:compl.yue@icloud.com" target="_blank" moz-do-not-send="true" class="">compl.yue@icloud.com</a>>
                      wrote:</p>
                  </div>
                  <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt
                    solid rgb(204,204,204);padding:0cm 0cm 0cm
                    6pt;margin-left:4.8pt;margin-right:0cm" class="">
                    <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                        Hi Cafe and Ryan,</p>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          I tried Map/Set from stm-containers and
                          TSkipList (added range scan api against its
                          internal data structure) from <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhackage.haskell.org%2Fpackage%2Ftskiplist&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838761589&sdata=ZOvJVBqJgdGqx2k%2F49fhZeTYkWAd4GRY%2B8ZxH7cyEkI%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://hackage.haskell.org/package/tskiplist</a> ,
                          with them I've got quite improved at
                          scalability on concurrency. </p>
                      </div>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          But unfortunately then I hit another wall at
                          single thread scalability over working memory
                          size, I suspect it's because massively more
                          TVars (those being pointers per se) are
                          introduced by those "contention-free" data
                          structures, they need to mutate separate
                          pointers concurrently in avoiding contentions
                          anyway, but such pointer-intensive heap seems
                          imposing extraordinary pressure to GHC's
                          garbage collector, that GC will dominate CPU
                          utilization with poor business progress. </p>
                      </div>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          For example in my test, I use `+RTS -H2g` for
                          the Haskell server process, so GC is not
                          triggered until after a while, then spin off 3
                          Python client to insert new records
                          concurrently, in the first stage each Python
                          process happily taking ~90% CPU filling
                          (through local mmap) the arrays allocated from
                          the server and logs of success scroll quickly,
                          while the server process utilizes only 30~40%
                          CPU to serve those 3 clients (insert meta data
                          records into unique indices merely); then the
                          client processes' CPU utilization drop
                          drastically once Haskell server process'
                          private memory reached around 2gb, i.e. GC
                          started engaging, the server process's CPU
                          utilization quickly approaches ~300%, while
                          all client processes' drop to 0% for most of
                          the time, and occasionally burst a tiny while
                          with some log output showing progress. And I
                          disable parallel GC lately, enabling parallel
                          GC only makes it worse.</p>
                      </div>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          If I comment out the code updating the indices
                          (those creating many TVars), the overall
                          throughput only drop slowly as more data are
                          inserted, the parallelism feels steady even
                          after the server process' private memory takes
                          several GBs.</p>
                      </div>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          I didn't expect this, but appears to me that
                          GC of GHC is really not good at handling
                          massive number of pointers in the heap, while
                          those pointers are essential to reduce
                          contention (and maybe expensive data copying
                          too) at heavy parallelism/concurrency.</p>
                      </div>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          Btw I tried `+RTS -xn` with GHC 8.10.1 too, no
                          obvious different behavior compared to 8.8.3;
                          and also tried tweaking GC related RTS options
                          a bit, including increasing -G up to 10, no
                          much difference too.</p>
                      </div>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          I feel hopeless at the moment, wondering if
                          I'll have to rewrite this in-memory db in
                          Go/Rust or some other runtime ...</p>
                      </div>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          Btw I read <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftech.channable.com%2Fposts%2F2020-04-07-lessons-in-managing-haskell-memory.html&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838761589&sdata=gqSH82%2FOYRaW4fzBDl%2BLDjhbRA%2BDRE6jaj4k1UI2gFE%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://tech.channable.com/posts/2020-04-07-lessons-in-managing-haskell-memory.html</a> in
                          searching about the symptoms, and don't feel
                          likely to convert my DB managed data into
                          immutable types thus to fit into Compact
                          Regions, not quite likely a live in-mem
                          database instance can do.</p>
                      </div>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          So seems there are good reasons no successful
                          DBMS, at least in-memory ones have been
                          written in Haskell.</p>
                      </div>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          Best regards,</p>
                      </div>
                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                          Compl</p>
                      </div>
                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                           <br class="webkit-block-placeholder"></div>
                        <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                            <br class="">
                            <br class="">
                          </p>
                          <blockquote style="margin-top:5pt;margin-bottom:5pt" class="">
                            <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                On 2020-07-25, at 22:07, Ryan Yates <<a href="mailto:fryguybob@gmail.com" target="_blank" moz-do-not-send="true" class="">fryguybob@gmail.com</a>>
                                wrote:</p>
                            </div><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                               <br class="webkit-block-placeholder"></div>
                            <div class="">
                              <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                  Unfortunately my STM benchmarks are
                                  rather disorganized.  The most
                                  relevant paper using them is:</p>
                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                                     <br class="webkit-block-placeholder"></div>
                                </div>
                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                    Leveraging hardware TM in Haskell
                                    (PPoPP '19)</p>
                                </div>
                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                    <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdl.acm.org%2Fdoi%2F10.1145%2F3293883.3295711&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838771582&sdata=h3po1gPutR%2BsiCST1N0RNkM6irnVL0%2BVbYl3Vs8F8Oc%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://dl.acm.org/doi/10.1145/3293883.3295711</a></p>
                                </div>
                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                                     <br class="webkit-block-placeholder"></div>
                                </div>
                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                    Or my thesis:</p>
                                </div>
                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                    <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furresearch.rochester.edu%2FinstitutionalPublicationPublicView.action%3FinstitutionalItemId%3D34931&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838771582&sdata=jBQMX5RRajIj0KbLWQCMt%2BMyMJIEmTpSuEHBWpq5Isg%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://urresearch.rochester.edu/institutionalPublicationPublicView.action?institutionalItemId=34931</a> </p>
                                </div>
                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                                     <br class="webkit-block-placeholder"></div>
                                </div>
                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                     The PPoPP benchmarks are on a
                                    branch (or the releases tab on
                                    github):</p>
                                </div>
                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                    <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffryguybob%2Fghc-stm-benchmarks%2Ftree%2Fwip%2Fmutable-fields%2Fbenchmarks%2FPPoPP2019%2Fsrc&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838771582&sdata=PinsrrGPgAB9TgxH61xngSItw1DcIRf1Niq39b%2BOe0s%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://github.com/fryguybob/ghc-stm-benchmarks/tree/wip/mutable-fields/benchmarks/PPoPP2019/src</a> </p>
                                </div>
                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                                     <br class="webkit-block-placeholder"></div>
                                </div>
                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                                     <br class="webkit-block-placeholder"></div>
                                </div>
                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                     All that to say, without an
                                    implementation of mutable
                                    constructor fields (which I'm
                                    working on getting into GHC) the
                                    scaling is limited.</p>
                                </div>
                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                                     <br class="webkit-block-placeholder"></div>
                                </div>
                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                    Ryan</p>
                                </div>
                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                                     <br class="webkit-block-placeholder"></div>
                                </div>
                              </div><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">
                                 <br class="webkit-block-placeholder"></div>
                              <div class="">
                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">
                                    On Sat, Jul 25, 2020 at 3:45 AM
                                    Compl Yue via Haskell-Cafe <<a href="mailto:haskell-cafe@haskell.org" target="_blank" moz-do-not-send="true" class="">haskell-cafe@haskell.org</a>>
                                    wrote:</p>
                                </div>
                                <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt
                                  solid rgb(204,204,204);padding:0cm 0cm
                                  0cm
                                  6pt;margin-left:4.8pt;margin-right:0cm" class="">
                                  <div class=""><p class="">Dear Cafe,</p><p class="">As Chris Allen has suggested, I
                                      learned that  <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhackage.haskell.org%2Fpackage%2Fstm-containers&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838781576&sdata=ZwtAltlFRkny5q7M%2B7Pople6c4WA%2Bs8vZhwewUge7eg%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">
https://hackage.haskell.org/package/stm-containers</a> and <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhackage.haskell.org%2Fpackage%2Fttrie&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838781576&sdata=zMcZy%2BEzqklkQGjKglCgwg5ZoWyWZIyeRNaCcqtnECs%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">
https://hackage.haskell.org/package/ttrie</a> can help a lot when used
                                      in place of traditional HashMap
                                      for stm tx processing, under heavy
                                      concurrency, yet still with
                                      automatic parallelism as GHC
                                      implemented them. Then I realized
                                      that in addition to hash map (used
                                      to implement dicts and scopes), I
                                      also need to find a TreeMap
                                      replacement data structure to
                                      implement the db index. I've been
                                      focusing on the uniqueness
                                      constraint aspect, but it's still
                                      an index, needs to provide range
                                      scan api for db clients, so hash
                                      map is not sufficient for the
                                      index.</p><p class="">I see Ryan shared the code
                                      benchmarking RBTree with stm in
                                      mind:</p><p class=""><a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffryguybob%2Fghc-stm-benchmarks%2Ftree%2Fmaster%2Fbenchmarks%2FRBTree-Throughput&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838791571&sdata=Nl2eN81Kjaf5qyNKEaxxc0ioMw6w4QoX4b5vAE5RaF8%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://github.com/fryguybob/ghc-stm-benchmarks/tree/master/benchmarks/RBTree-Throughput</a>
                                    </p><p class=""><a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffryguybob%2Fghc-stm-benchmarks%2Ftree%2Fmaster%2Fbenchmarks%2FRBTree&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838791571&sdata=%2BLp6HQCyROOlpA2pr8BR8DPls68oY5Y77GKgqbSKmno%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://github.com/fryguybob/ghc-stm-benchmarks/tree/master/benchmarks/RBTree</a></p><p class="">But can't find conclusion or
                                      interpretation of that benchmark
                                      suite. And here's a followup
                                      question:</p><div class=""> <br class="webkit-block-placeholder"></div><p class="">Where are some STM contention
                                      optimized data structures, that
                                      having keys ordered, with
                                      sub-range traversing api ?
                                    </p><p class="">(of course production ready
                                      libraries most desirable)</p><div class=""> <br class="webkit-block-placeholder"></div><p class="">Thanks with regards,</p><p class="">Compl</p><div class=""> <br class="webkit-block-placeholder"></div>
                                    <div class=""><p class="MsoNormal">On 2020/7/25
                                        <span style="font-family:"MS
                                          Gothic"" class="">下午</span>2:04,
                                        Compl Yue via Haskell-Cafe
                                        wrote:</p>
                                    </div>
                                    <blockquote style="margin-top:5pt;margin-bottom:5pt" class=""><p class="">Shame on me for I have neither
                                        experienced with `perf`, I'd
                                        learn these essential tools soon
                                        to put them into good use.</p><p class="">It's great to learn about how
                                        `orElse` actually works, I did
                                        get confused why there are so
                                        little retries captured, and now
                                        I know. So that little trick
                                        should definitely be removed
                                        before going production, as it
                                        does no much useful things at
                                        excessive cost. I put it there
                                        to help me understand internal
                                        working of stm, now I get even
                                        better knowledge ;-)</p><p class="">I think a debugger will trap
                                        every single abort, isn't it
                                        annoying when many aborts would
                                        occur? If I'd like to count the
                                        number of aborts, ideally
                                        accounted per service endpoints,
                                        time periods, source modules
                                        etc. there some tricks for that?</p><p class="">Thanks with best regards,</p><p class="">Compl</p><div class=""> <br class="webkit-block-placeholder"></div>
                                      <div class=""><p class="MsoNormal">On
                                          2020/7/25 <span style="font-family:"MS
                                            Gothic"" class="">上午</span>2:02,
                                          Ryan Yates wrote:</p>
                                      </div>
                                      <blockquote style="margin-top:5pt;margin-bottom:5pt" class="">
                                        <div class=""><p class="MsoNormal">To be
                                            clear, I was trying to refer
                                            to Linux `perf` [^1]. 
                                            Sampling based profiling can
                                            do a good job with
                                            concurrent and parallel
                                            programs where other methods
                                            are problematic.  For
                                            instance,
                                          </p>
                                          <div class=""><p class="MsoNormal"> changing
                                              the size of heap objects
                                              can drastically change
                                              cache performance and
                                              completely different
                                              behavior can show up.</p>
                                          </div>
                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                          </div>
                                          <div class=""><p class="MsoNormal">[^1]: <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPerf_(Linux)&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838801566&sdata=v%2Bv2aVaBITriAM26CqN%2Bp35yshLl%2BbY4BWVEIOSlStA%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://en.wikipedia.org/wiki/Perf_(Linux)</a></p>
                                          </div>
                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                          </div>
                                          <div class=""><p class="MsoNormal">The
                                              spinning in `readTVar`
                                              should always be very
                                              short and it typically
                                              shows up as intensive CPU
                                              use, though it may not be
                                              high energy use with
                                              `pause` in the loop on x86
                                              (looks like we don't have
                                              it [^2], I thought we did,
                                              but maybe that was only in
                                              some of my code... )</p>
                                          </div>
                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                          </div>
                                          <div class=""><p class="MsoNormal">[^2]: <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fghc%2Fghc%2Fblob%2Fmaster%2Frts%2FSTM.c%23L1275&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838801566&sdata=YBLmeg4Xxby%2BJJmO8B5etdA6tDpBYOry7jdjEoRFd%2Fk%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://github.com/ghc/ghc/blob/master/rts/STM.c#L1275</a> </p>
                                          </div>
                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                          </div>
                                          <div class=""><p class="MsoNormal">All
                                              that to say, I doubt that
                                              you are spending much time
                                              spinning (but it would
                                              certainly be interesting
                                              to know if you are!  You
                                              would see `perf` attribute
                                              a large amount of time to
                                              `read_current_value`). 
                                              The amount of code to
                                              execute for commit (the
                                              time when locks are held)
                                              is always much shorter
                                              than it takes to execute
                                              the transaction body.  As
                                              you add more conflicting
                                              threads this gets worse of
                                              course as commits
                                              sequence.</p>
                                          </div>
                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                          </div>
                                          <div class=""><p class="MsoNormal">The
                                              code you have will count
                                              commits of executions of
                                              `retry`.  Note that
                                              `retry` is a user level
                                              idea, that is, you are
                                              counting user level
                                              *explicit* retries.  This
                                              is different from a
                                              transaction failing to
                                              commit and starting
                                              again.  These are
                                              invisible to the user. 
                                              Also using your trace will
                                              convert `retry` from the
                                              efficient wake on write
                                              implementation, to an
                                              active retry that will
                                              always attempt again.  We
                                              don't have cheap logging
                                              of transaction aborts in
                                              GHC, but I have built such
                                              logging in my work.  You
                                              can observe these aborts
                                              with a debugger by looking
                                              for execution of this
                                              line:</p>
                                          </div>
                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                          </div>
                                          <div class=""><p class="MsoNormal"><a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fghc%2Fghc%2Fblob%2Fmaster%2Frts%2FSTM.c%23L1123&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838811560&sdata=jAEm1CpEYQx6ORikerxVHOSlaOmrTzB3m9EVmOwo%2B8w%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://github.com/ghc/ghc/blob/master/rts/STM.c#L1123</a></p>
                                          </div>
                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                          </div>
                                          <div class=""><p class="MsoNormal">Ryan </p>
                                          </div>
                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                          </div>
                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                          </div>
                                        </div><div class=""> <br class="webkit-block-placeholder"></div>
                                        <div class="">
                                          <div class=""><p class="MsoNormal">On Fri,
                                              Jul 24, 2020 at 12:35 PM
                                              Compl Yue <<a href="mailto:compl.yue@icloud.com" target="_blank" moz-do-not-send="true" class="">compl.yue@icloud.com</a>>
                                              wrote:</p>
                                          </div>
                                          <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt
                                            solid
                                            rgb(204,204,204);padding:0cm
                                            0cm 0cm
                                            6pt;margin-left:4.8pt;margin-right:0cm" class="">
                                            <div class=""><p class="">I'm not familiar with
                                                profiling GHC yet, may
                                                need more time to get
                                                myself proficient with
                                                it.</p><p class="">And a bit more details
                                                of my test workload for
                                                diagnostic: the db
                                                clients are Python
                                                processes from a cluster
                                                of worker nodes,
                                                consulting the db server
                                                to register some path
                                                for data files, under a
                                                data dir within a shared
                                                filesystem, then mmap
                                                those data files and
                                                fill in actual array
                                                data. So the db server
                                                don't have much
                                                computation to perform,
                                                but puts the data file
                                                path into a global
                                                index, which at the same
                                                validates its
                                                uniqueness. As there are
                                                many client processes
                                                trying to insert one
                                                meta data record
                                                concurrently, with my
                                                naive implementation,
                                                the global index's TVar
                                                will almost always in
                                                locked state by one
                                                client after another,
                                                from a queue never fall
                                                empty.</p><p class="">So if `readTVar` should
                                                spinning waiting, I
                                                doubt the threads should
                                                actually make high CPU
                                                utilization, because at
                                                any instant of time, all
                                                threads except the
                                                committing one will be
                                                doing that one thing.</p><p class="">And I have something in
                                                my code to track STM
                                                retry like this:</p><p class="">```</p>
                                              <div class="">
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(106,153,85)" class="">-- blocking wait not
                                                      expected, track
                                                      stm retries
                                                      explicitly</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(220,220,170)" class="">trackSTM</span><span style="font-size:13pt;color:rgb(212,212,212)" class="">
                                                      ::
                                                    </span><span style="font-size:13pt;color:rgb(86,156,214)" class="">Int</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""> ->
                                                    </span><span style="font-size:13pt;color:rgb(86,156,214)" class="">IO</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""> (</span><span style="font-size:13pt;color:rgb(86,156,214)" class="">Either</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""> ()
                                                    </span><span style="font-size:13pt;color:rgb(156,220,254)" class="">a</span><span style="font-size:13pt;color:rgb(212,212,212)" class="">)</span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">trackSTM !rtc =
                                                    </span><span style="font-size:13pt;color:rgb(197,134,192)" class="">do</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">when
                                                    </span><span style="font-size:13pt;color:rgb(106,153,85)" class="">--
                                                      todo increase the
                                                      threshold of
                                                      reporting?</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">(rtc >
                                                    </span><span style="font-size:13pt;color:rgb(181,206,168)" class="">0</span><span style="font-size:13pt;color:rgb(212,212,212)" class="">) $
                                                    </span><span style="font-size:13pt;color:rgb(197,134,192)" class="">do</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(106,153,85)" class="">-- trace out the retries so
                                                      the end users can
                                                      be aware of them</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">tid <- myThreadId</span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">trace</span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">(
                                                    </span><span style="font-size:13pt;color:rgb(206,145,120)" class="">"</span><span style="font-size:13pt;font-family:"Segoe UI
                                                      Emoji",sans-serif;color:rgb(206,145,120)" class="">🔙</span><span style="font-size:13pt;color:rgb(215,186,125)" class="">\n</span><span style="font-size:13pt;color:rgb(206,145,120)" class="">"</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class=""><> show callCtx</span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class=""><>
                                                    </span><span style="font-size:13pt;color:rgb(206,145,120)" class="">"</span><span style="font-size:13pt;font-family:"Segoe UI
                                                      Emoji",sans-serif;color:rgb(206,145,120)" class="">🌀</span><span style="font-size:13pt;color:rgb(206,145,120)" class=""> "</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class=""><> show tid</span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class=""><>
                                                    </span><span style="font-size:13pt;color:rgb(206,145,120)" class="">"
                                                      stm retry #"</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class=""><> show rtc</span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">)</span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">$ return
                                                    </span><span style="font-size:13pt;color:rgb(86,156,214)" class="">()</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">atomically ((Just
                                                      <$> stmJob)
                                                      `orElse` return
                                                      Nothing) >>=
                                                      \</span><span style="font-size:13pt;color:rgb(197,134,192)" class="">case</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">Nothing ->
                                                    </span><span style="font-size:13pt;color:rgb(106,153,85)" class="">--
                                                      stm failed, do a
                                                      tracked retry</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">trackSTM (rtc +
                                                    </span><span style="font-size:13pt;color:rgb(181,206,168)" class="">1</span><span style="font-size:13pt;color:rgb(212,212,212)" class="">)</span></p>
                                                </div>
                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">Just ... -> ...</span></p>
                                                </div>
                                              </div><p class="">```</p><p class="">No such trace msg fires
                                                during my test, neither
                                                in single thread run,
                                                nor in runs with
                                                pressure. I'm sure this
                                                tracing mechanism works,
                                                as I can see such traces
                                                fire, in case e.g.
                                                posting a TMVar to a
                                                TQueue for some other
                                                thread to fill it, then
                                                read the result out, if
                                                these 2 ops are composed
                                                into a single tx, then
                                                of course it's infinite
                                                retry loop, and a
                                                sequence of such msgs
                                                are logged with ever
                                                increasing rtc #.</p><p class="">So I believe no retry
                                                has ever been triggered.</p><p class="">What can going on
                                                there?</p><div class=""> <br class="webkit-block-placeholder"></div>
                                              <div class=""><p class="MsoNormal">On
                                                  2020/7/24 <span style="font-family:"MS
                                                    Gothic"" class="">下午</span>11:46,
                                                  Ryan Yates wrote:</p>
                                              </div>
                                              <blockquote style="margin-top:5pt;margin-bottom:5pt" class="">
                                                <div class="">
                                                  <div class="">
                                                    <div class=""><p class="MsoNormal">>
                                                        Then to explain
                                                        the low CPU
                                                        utilization
                                                        (~10%), am I
                                                        right to
                                                        understand it as
                                                        that upon
                                                        reading a TVar
                                                        locked by
                                                        another
                                                        committing tx, a
                                                        lightweight
                                                        thread will put
                                                        itself into
                                                        `waiting STM`
                                                        and descheduled
                                                        state, so the
                                                        CPUs can only
                                                        stay idle as not
                                                        so many threads
                                                        are willing to
                                                        proceed?</p>
                                                    </div>
                                                    <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                    </div>
                                                    <div class=""><p class="MsoNormal">Since
                                                        the commit
                                                        happens in
                                                        finite steps,
                                                        the expectation
                                                        is that the lock
                                                        will be released
                                                        very soon. 
                                                        Given this when
                                                        the body of a
                                                        transaction
                                                        executes
                                                        `readTVar` it
                                                        spins (active
                                                        CPU!) until the
                                                        `TVar` is
                                                        observed
                                                        unlocked.  If a
                                                        lock is observed
                                                        while commiting,
                                                        it immediately
                                                        starts the
                                                        transaction
                                                        again from the
                                                        beginning.  To
                                                        get the behavior
                                                        of suspending a
                                                        transaction you
                                                        have to
                                                        successfully
                                                        commit a
                                                        transaction that
                                                        executed
                                                        `retry`.  Then
                                                        the transaction
                                                        is put on the
                                                        wakeup lists of
                                                        its read set and
                                                        subsequent
                                                        commits will
                                                        wake it up if
                                                        its write set
                                                        overlaps.</p>
                                                    </div>
                                                    <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                    </div>
                                                    <div class=""><p class="MsoNormal">I
                                                        don't think any
                                                        of these things
                                                        would explain
                                                        low CPU
                                                        utilization. 
                                                        You could try
                                                        running with
                                                        `perf` and see
                                                        if lots of time
                                                        is spent in some
                                                        recognizable
                                                        part of the RTS.</p>
                                                    </div>
                                                    <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                    </div>
                                                    <div class=""><p class="MsoNormal">Ryan</p>
                                                    </div>
                                                  </div>
                                                  <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                  </div><div class=""> <br class="webkit-block-placeholder"></div>
                                                  <div class="">
                                                    <div class=""><p class="MsoNormal">On
                                                        Fri, Jul 24,
                                                        2020 at 11:22 AM
                                                        Compl Yue <<a href="mailto:compl.yue@icloud.com" target="_blank" moz-do-not-send="true" class="">compl.yue@icloud.com</a>>
                                                        wrote:</p>
                                                    </div>
                                                    <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt
                                                      solid
                                                      rgb(204,204,204);padding:0cm
                                                      0cm 0cm
                                                      6pt;margin-left:4.8pt;margin-right:0cm" class="">
                                                      <div class=""><p class="">Thanks very
                                                          much for the
                                                          insightful
                                                          information
                                                          Ryan! I'm glad
                                                          my suspect was
                                                          wrong about
                                                          the Haskell
                                                          scheduler:</p><p class="">> The
                                                          Haskell
                                                          capability
                                                          that is
                                                          committing a
                                                          transaction
                                                          will not yield
                                                          to another
                                                          Haskell thread
                                                          while it is
                                                          doing the
                                                          commit.  The
                                                          OS thread may
                                                          be
                                                          preempted, but
                                                          once commit
                                                          starts the
                                                          haskell
                                                          scheduler is
                                                          not invoked
                                                          until after
                                                          locks are
                                                          released.</p>
                                                        <div class=""><p class="MsoNormal">So
                                                          best effort
                                                          had already
                                                          been made in
                                                          GHC and I just
                                                          need to
                                                          cooperate
                                                          better with
                                                          its design.
                                                          Then to
                                                          explain the
                                                          low CPU
                                                          utilization
                                                          (~10%), am I
                                                          right to
                                                          understand it
                                                          as that upon
                                                          reading a TVar
                                                          locked by
                                                          another
                                                          committing tx,
                                                          a lightweight
                                                          thread will
                                                          put itself
                                                          into `waiting
                                                          STM` and
                                                          descheduled
                                                          state, so the
                                                          CPUs can only
                                                          stay idle as
                                                          not so many
                                                          threads are
                                                          willing to
                                                          proceed?</p>
                                                        </div>
                                                        <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                        </div>
                                                        <div class=""><p class="MsoNormal">Anyway,
                                                          I see light
                                                          with better
                                                          data
                                                          structures to
                                                          improve my
                                                          situation, let
                                                          me try them
                                                          and report
                                                          back. Actually
                                                          I later
                                                          changed `TVar
                                                          (HaskMap k v)`
                                                          to be `TVar
                                                          (HashMap k
                                                          Int)` where
                                                          the `Int`
                                                          being array
                                                          index into
                                                          `TVar (Vector
                                                          (TVar (Maybe
                                                          v)))`, in
                                                          pursuing
                                                          insertion
                                                          order
                                                          preservation
                                                          semantic of
                                                          dict entries
                                                          (like that in
                                                          Python 3.7+),
                                                          then it's very
                                                          hopeful to
                                                          incorporate
                                                          stm-containers'
                                                          Map or ttrie
                                                          to approach
                                                          free of
                                                          contention.</p>
                                                        </div><p class="">Thanks with
                                                          regards,</p><p class="">Compl</p><div class=""> <br class="webkit-block-placeholder"></div>
                                                        <div class=""><p class="MsoNormal">On
                                                          2020/7/24 <span style="font-family:"MS Gothic"" class="">下午</span>10:03, Ryan Yates
                                                          wrote:</p>
                                                        </div>
                                                        <blockquote style="margin-top:5pt;margin-bottom:5pt" class="">
                                                          <div class="">
                                                          <div class=""><p class="MsoNormal">Hi
                                                          Compl,</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">Having
                                                          a pool of
                                                          transaction processing
                                                          threads can be
                                                          helpful in a
                                                          certain way. 
                                                          If the body of
                                                          the
                                                          transaction
                                                          takes more
                                                          time to
                                                          execute then
                                                          the Haskell
                                                          thread is
                                                          allowed and it
                                                          yields, the
                                                          suspended thread
                                                          won't get in
                                                          the way of
                                                          other thread,
                                                          but when it is
                                                          rescheduled,
                                                          will have a
                                                          low
                                                          probability of
                                                          success.  Even
                                                          worse, it will
                                                          probably not
                                                          discover that
                                                          it is doomed
                                                          to failure
                                                          until commit
                                                          time.  If
                                                          transactions
                                                          are more
                                                          likely to
                                                          reach commit
                                                          without
                                                          yielding, they
                                                          are more
                                                          likely to
                                                          succeed.  If
                                                          the
                                                          transactions
                                                          are not
                                                          conflicting,
                                                          it doesn't
                                                          make much
                                                          difference
                                                          other than
                                                          cache churn.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">The
                                                          Haskell
                                                          capability
                                                          that is
                                                          committing a
                                                          transaction
                                                          will not yield
                                                          to another
                                                          Haskell thread
                                                          while it is
                                                          doing the
                                                          commit.  The
                                                          OS thread may
                                                          be
                                                          preempted, but
                                                          once commit
                                                          starts the
                                                          haskell
                                                          scheduler is
                                                          not invoked
                                                          until after
                                                          locks are
                                                          released.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">To
                                                          get good
                                                          performance
                                                          from STM you
                                                          must pay
                                                          attention to
                                                          what TVars are
                                                          involved in a
                                                          commit.  All
                                                          STM systems
                                                          are working
                                                          under the
                                                          assumption of
                                                          low
                                                          contention, so
                                                          you want to
                                                          minimize
                                                          "false"
                                                          conflicts
                                                          (conflicts
                                                          that are not
                                                          essential to
                                                          the
                                                          computation). 
                                                            Something
                                                          like `TVar
                                                          (HashMap k v)`
                                                          will work
                                                          pretty well
                                                          for a low
                                                          thread count,
                                                          but every
                                                          transaction
                                                          that writes to
                                                          that structure
                                                          will be in
                                                          conflict with
                                                          every other
                                                          transaction
                                                          that accesses
                                                          it.  Pushing
                                                          the `TVar`
                                                          into the nodes
                                                          of the
                                                          structure
                                                          reduces the
                                                          possibilities
                                                          for conflict,
                                                          while
                                                          increasing the
                                                          amount of
                                                          bookkeeping
                                                          STM has to
                                                          do.  I would
                                                          like to reduce
                                                          the cost of
                                                          that
                                                          bookkeeping
                                                          using better
                                                          structures,
                                                          but we need to
                                                          do so without
                                                          harming
                                                          performance in
                                                          the low TVar
                                                          count case. 
                                                          Right now it
                                                          is optimized
                                                          for good cache
                                                          performance
                                                          with a handful
                                                          of TVars.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">There
                                                          is another way
                                                          to play with
                                                          performance by
                                                          moving work
                                                          into and out
                                                          of the
                                                          transaction
                                                          body.  A
                                                          transaction
                                                          body that
                                                          executes
                                                          quickly will
                                                          reach commit
                                                          faster.  But
                                                          it may be
                                                          delaying work
                                                          that moves
                                                          into another
                                                          transaction. 
                                                          Forcing values
                                                          at the right
                                                          time can make
                                                          a big
                                                          difference.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">Ryan</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">On
                                                          Fri, Jul 24,
                                                          2020 at 2:14
                                                          AM Compl Yue
                                                          via
                                                          Haskell-Cafe
                                                          <<a href="mailto:haskell-cafe@haskell.org" target="_blank" moz-do-not-send="true" class="">haskell-cafe@haskell.org</a>>
                                                          wrote:</p>
                                                          </div>
                                                          <div class="">
                                                          <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt
                                                          solid
                                                          rgb(204,204,204);padding:0cm
                                                          0cm 0cm
                                                          6pt;margin-left:4.8pt;margin-right:0cm" class="">
                                                          <div class=""><p class="">Thanks
                                                          Chris, I
                                                          confess I
                                                          didn't pay
                                                          enough
                                                          attention to
                                                          STM
                                                          specialized
                                                          container
                                                          libraries by
                                                          far, I skimmed
                                                          through the
                                                          description of
                                                          stm-containers
                                                          and ttrie, and
                                                          feel they
                                                          would
                                                          definitely
                                                          improve my
                                                          code's
                                                          performance in
                                                          case I limit
                                                          the server's
                                                          parallelism
                                                          within
                                                          hardware
                                                          capabilities.
                                                          That may
                                                          because I'm
                                                          still
                                                          prototyping
                                                          the api and
                                                          infrastructure
                                                          for
                                                          correctness,
                                                          so even `TVar
                                                          (HashMap k v)`
                                                          performs okay
                                                          for me at the
                                                          moment, only
                                                          if at low
                                                          contention
                                                          (surely
                                                          there're
                                                          plenty of CPU
                                                          cycles to be
                                                          optimized out
                                                          in next
                                                          steps). I
                                                          model my data
                                                          after graph
                                                          model, so most
                                                          data, even
                                                          most indices
                                                          are localized
                                                          to nodes and
                                                          edges, those
                                                          can be
                                                          manipulated
                                                          without
                                                          conflict,
                                                          that's why I
                                                          assumed I have
                                                          a low
                                                          contention use
                                                          case since the
                                                          very beginning
                                                          - until I
                                                          found there
                                                          are still
                                                          (though minor)
                                                          needs for
                                                          global indices
                                                          to guarantee
                                                          global
                                                          uniqueness, I
                                                          feel faithful
                                                          with
                                                          stm-containers/ttrie
                                                          to implement a
                                                          more scalable
                                                          global index
                                                          data
                                                          structure,
                                                          thanks for
                                                          hinting me.</p><p class="">So an
                                                          evident
                                                          solution comes
                                                          into my mind
                                                          now, is to run
                                                          the server
                                                          with a pool of
                                                          tx processing
                                                          threads,
                                                          matching
                                                          number of CPU
                                                          cores, client
                                                          RPC requests
                                                          then get
                                                          queued to be
                                                          executed in
                                                          some thread
                                                          from the pool.
                                                          But I'm really
                                                          fond of the
                                                          mechanism of
                                                          M:N scheduler
                                                          which solves
                                                          massive/dynamic
                                                          concurrency so
                                                          elegantly. I
                                                          had some good
                                                          result with Go
                                                          in this
                                                          regard, and
                                                          see GHC at par
                                                          in doing this,
                                                          I don't want
                                                          to give up
                                                          this enjoyable
                                                          machinery.</p><p class="">But looked
                                                          at the stm
                                                          implementation
                                                          in GHC, it
                                                          seems written
                                                          TVars are
                                                          exclusively
                                                          locked during
                                                          commit of a
                                                          tx, I suspect
                                                          this is the
                                                          culprit when
                                                          there're large
                                                          M lightweight
                                                          threads
                                                          scheduled upon
                                                          a small N
                                                          hardware
                                                          capabilities,
                                                          that is when a
                                                          lightweight
                                                          thread yield
                                                          control during
                                                          an stm
                                                          transaction
                                                          commit, the
                                                          TVars it
                                                          locked will
                                                          stay so until
                                                          it's scheduled
                                                          again (and
                                                          again) till it
                                                          can finish the
                                                          commit. This
                                                          way,
                                                          descheduled
                                                          threads could
                                                          hold live
                                                          threads from
                                                          progressing. I
                                                          haven't gone
                                                          into more
                                                          details there,
                                                          but wonder if
                                                          there can be
                                                          some
                                                          improvement
                                                          for GHC RTS to
                                                          keep an stm
                                                          committing
                                                          thread from
                                                          descheduled,
                                                          but seemingly
                                                          that may
                                                          impose more
                                                          starvation
                                                          potential; or
                                                          stm can be
                                                          improved to
                                                          have its TVar
                                                          locks
                                                          preemptable
                                                          when the owner
                                                          trec/thread is
                                                          in descheduled
                                                          state? Neither
                                                          should be easy
                                                          but I'd really
                                                          love massive
                                                          lightweight
                                                          threads doing
                                                          STM
                                                          practically
                                                          well.</p><p class="">Best
                                                          regards,</p><p class="">Compl</p><div class=""> <br class="webkit-block-placeholder"></div>
                                                          <div class=""><p class="MsoNormal">On
                                                          2020/7/24 <span style="font-family:"MS Gothic"" class="">上午</span>12:57, Christopher
                                                          Allen wrote:</p>
                                                          </div>
                                                          <blockquote style="margin-top:5pt;margin-bottom:5pt" class="">
                                                          <div class=""><p class="MsoNormal">It
                                                          seems like you
                                                          know how to
                                                          run practical
                                                          tests for
                                                          tuning thread
                                                          count and
                                                          contention for
                                                          throughput.
                                                          Part of the
                                                          reason you
                                                          haven't gotten
                                                          a super clear
                                                          answer is "it
                                                          depends." You
                                                          give up
                                                          fairness when
                                                          you use STM
                                                          instead of
                                                          MVars or
                                                          equivalent
                                                          structures.
                                                          That means a
                                                          long running
                                                          transaction
                                                          might get
                                                          stampeded by
                                                          many small
                                                          ones
                                                          invalidating
                                                          it over and
                                                          over. The
                                                          long-running
                                                          transaction
                                                          might never
                                                          clear if
                                                          the small
                                                          transactions
                                                          keep moving
                                                          the cheese. I
                                                          mention this
                                                          because
                                                          transaction
                                                          runtime and
                                                          size and count
                                                          all affect
                                                          throughput and
                                                          latency. What
                                                          might be ideal
                                                          for one
                                                          pattern of
                                                          work might not
                                                          be ideal for
                                                          another.
                                                          Optimizing for
                                                          overall
                                                          throughput
                                                          might make the
                                                          contention and
                                                          fairness
                                                          problems worse
                                                          too. I've done
                                                          practical
                                                          tests to
                                                          optimize this
                                                          in the past,
                                                          both for STM
                                                          in Haskell and
                                                          for RDBMS
                                                          workloads. </p>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">The
                                                          next step is
                                                          sometimes
                                                          figuring out
                                                          whether you
                                                          really need a
                                                          data structure
                                                          within a
                                                          single STM
                                                          container or
                                                          if perhaps you
                                                          can break up
                                                          your STM
                                                          container
                                                          boundaries
                                                          into zones or
                                                          regions that
                                                          roughly map
                                                          onto update
                                                          boundaries.
                                                          That should
                                                          make the
                                                          transactions
                                                          churn less. On
                                                          the outside
                                                          chance you do
                                                          need to touch
                                                          more than one
                                                          container in a
                                                          transaction,
                                                          well, they
                                                          compose.
                                                          </p>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">e.g. <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhackage.haskell.org%2Fpackage%2Fstm-containers&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838811560&sdata=Lq1%2BGj0Z6%2BBGMRAZrSzcTAlYgj0B0A67RaQcyyCcXbk%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://hackage.haskell.org/package/stm-containers</a></p>
                                                          </div>
                                                          <div class=""><p class="MsoNormal"><a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhackage.haskell.org%2Fpackage%2Fttrie&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838821555&sdata=PpaiVM2NrPM2HzK0bh%2BMR8YF90yHlxKnN9gwZVQHqR0%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://hackage.haskell.org/package/ttrie</a></p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">It
                                                          also sounds a
                                                          bit like
                                                          your question
                                                          bumps into
                                                          Amdahl's Law a
                                                          bit.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">All
                                                          else fails,
                                                          stop using STM
                                                          and find
                                                          something more
                                                          tuned to your
                                                          problem space.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">Hope
                                                          this helps,</p>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">Chris
                                                          Allen</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          </div>
                                                          </div><div class=""> <br class="webkit-block-placeholder"></div>
                                                          <div class="">
                                                          <div class=""><p class="MsoNormal">On
                                                          Thu, Jul 23,
                                                          2020 at 9:53
                                                          AM YueCompl
                                                          via
                                                          Haskell-Cafe
                                                          <<a href="mailto:haskell-cafe@haskell.org" target="_blank" moz-do-not-send="true" class="">haskell-cafe@haskell.org</a>>
                                                          wrote:</p>
                                                          </div>
                                                          <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt
                                                          solid
                                                          rgb(204,204,204);padding:0cm
                                                          0cm 0cm
                                                          6pt;margin-left:4.8pt;margin-right:0cm" class="">
                                                          <div class=""><p class="MsoNormal">Hello
                                                          Cafe, </p>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">I'm
                                                          working on an
                                                          in-memory
                                                          database, in
                                                          Client/Server
                                                          mode I just
                                                          let each
                                                          connected
                                                          client submit
                                                          remote
                                                          procedure call
                                                          running in its
                                                          dedicated
                                                          lightweight
                                                          thread,
                                                          modifying
                                                          TVars in RAM
                                                          per its
                                                          business
                                                          needs, then in
                                                          case many
                                                          clients
                                                          connected
                                                          concurrently
                                                          and trying to
                                                          insert new
                                                          data, if they
                                                          are triggering
                                                          global index
                                                          (some TVar)
                                                          update, the
                                                          throughput
                                                          would drop
                                                          drastically. I
                                                          reduced the
                                                          shared state
                                                          to a simple
                                                          int counter by
                                                          TVar, got same
                                                          symptom. While
                                                          the
                                                          parallelism
                                                          feels okay
                                                          when there's
                                                          no hot TVar
                                                          conflicting,
                                                          or M is not
                                                          much greater
                                                          than N.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">As
                                                          an empirical
                                                          test workload,
                                                          I have a `+RTS
                                                          -N10` server
                                                          process, it
                                                          handles 10
                                                          concurrent
                                                          clients okay,
                                                          got ~5x of
                                                          single thread
                                                          throughput;
                                                          but in
                                                          handling 20
                                                          concurrent
                                                          clients, each
                                                          of the 10 CPUs
                                                          can only be
                                                          driven to ~10%
                                                          utilization,
                                                          the throughput
                                                          seems even
                                                          worse than
                                                          single thread.
                                                          More clients
                                                          can even drive
                                                          it thrashing
                                                          without much
                                                           progressing.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal"> I
                                                          can understand
                                                          that pure STM
                                                          doesn't scale
                                                          well after
                                                          reading [1],
                                                          and I see it
                                                          suggested [7]
                                                          attractive and
                                                          planned future
                                                          work toward
                                                          that
                                                          direction.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">But
                                                          I can't find
                                                          certain
                                                          libraries or
                                                          frameworks
                                                          addressing
                                                          large M over
                                                          small N
                                                          scenarios, [1]
                                                          experimented
                                                          with
                                                          designated N
                                                          parallelism,
                                                          and [7] is
                                                          rather
                                                          theoretical to
                                                          my empirical
                                                          needs.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">Can
                                                          you direct me
                                                          to some
                                                          available
                                                          library
                                                          implementing
                                                          the
                                                          methodology
                                                          proposed in
                                                          [7] or other
                                                          ways tackling
                                                          this problem?</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">I
                                                          think the most
                                                          difficult one
                                                          is that a
                                                          transaction
                                                          should commit
                                                          with global
                                                          indices (with
                                                          possibly
                                                          unique
                                                          constraints)
                                                          atomically
                                                          updated, and
                                                          rollback with
                                                          any violation
                                                          of
                                                          constraints,
                                                          i.e.
                                                          transactions
                                                          have to cover
                                                          global states
                                                          like indices.
                                                          Other problems
                                                          seem more
                                                          trivial than
                                                          this.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">Specifically,
                                                          [7] states:</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">> It
                                                          must be
                                                          emphasized
                                                          that all of
                                                          the mechanisms
                                                          we deploy
                                                          originate, in
                                                          one form or
                                                          another, in
                                                          the database
                                                          literature
                                                          from the 70s
                                                          and 80s. Our
                                                          contribution
                                                          is to adapt
                                                          these
                                                          techniques to
                                                          software
                                                          transactional
                                                          memory,
                                                          providing more
                                                          effective
                                                          solutions to
                                                          important STM
                                                          problems than
                                                          prior
                                                          proposals.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">I
                                                          wonder any STM
                                                          based library
                                                          has simplified
                                                          those
                                                          techniques to
                                                          be composed
                                                          right away? I
                                                          don't really
                                                          want to
                                                          implement
                                                          those
                                                          mechanisms by
                                                          myself,
                                                          rebuilding
                                                          many wheels
                                                          from scratch.</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">Best
                                                          regards,</p>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">Compl</p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">[1] Comparing
                                                          the
                                                          performance of
                                                          concurrent
                                                          linked-list
                                                          implementations
                                                          in Haskell </p>
                                                          </div>
                                                          <div class=""><p class="MsoNormal"><a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsimonmar.github.io%2Fbib%2Fpapers%2Fconcurrent-data.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838821555&sdata=41Jaz8ZRmRfBHyGKxfhJlm4xR7q0pOtJShtO0jTlOwQ%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://simonmar.github.io/bib/papers/concurrent-data.pdf</a></p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          <div class=""><p class="MsoNormal">[7]
                                                          M. Herlihy and
                                                          E. Koskinen.
                                                          Transactional
                                                          boosting: a
                                                          methodology
                                                          for
                                                          highly-concurrent
                                                          transactional
                                                          objects. In
                                                          Proc. of PPoPP
                                                          ’08, pages
                                                          207–216. ACM
                                                          Press, 2008.</p>
                                                          </div>
                                                          <div class=""><p class="MsoNormal"><a href="https://nam06.safelinks.protection.outlook.com/?url=https:%2F%2Fwww.cs.stevens.edu%2F~ejk%2Fpapers%2Fboosting-ppopp08.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838831548&sdata=ya8Az1oC6f2xoMb90S9HCH57UTQ0nV9sg6SW%2B5JCPC4%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://www.cs.stevens.edu/~ejk/papers/boosting-ppopp08.pdf</a></p>
                                                          </div>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div>
                                                          </div><p class="MsoNormal">_______________________________________________<br class="">
                                                          Haskell-Cafe
                                                          mailing list<br class="">
                                                          To
                                                          (un)subscribe,
                                                          modify options
                                                          or view
                                                          archives go
                                                          to:<br class="">
                                                          <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhaskell-cafe&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838831548&sdata=c2AV7CO42o3tcw0EuMzqedKkBCtQjWjvdMoUsb4llbY%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br class="">
                                                          Only members
                                                          subscribed via
                                                          the mailman
                                                          list are
                                                          allowed to
                                                          post.</p>
                                                          </blockquote>
                                                          </div><p class="MsoNormal"><br clear="all" class="">
                                                          </p>
                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>
                                                          </div><p class="MsoNormal">--
                                                          </p>
                                                          <div class="">
                                                          <div class="">
                                                          <div class="">
                                                          <div class="">
                                                          <div class=""><p class="MsoNormal">Chris
                                                          Allen</p>
                                                          <div class=""><p class="MsoNormal"><span style="font-size:9.5pt" class="">Currently working on </span><a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhaskellbook.com%2F&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838831548&sdata=tIHFQFZPIQgRp8oqGRvyebm1YQdCvGD0VoMcflzJwKc%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://haskellbook.com</a></p>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          </div><p class="MsoNormal">_______________________________________________<br class="">
                                                          Haskell-Cafe
                                                          mailing list<br class="">
                                                          To
                                                          (un)subscribe,
                                                          modify options
                                                          or view
                                                          archives go
                                                          to:<br class="">
                                                          <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhaskell-cafe&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838841547&sdata=vdzv5WBA62cNwO6DA1D4KEHDCweyOerpn1PdMK0A%2BHw%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br class="">
                                                          Only members
                                                          subscribed via
                                                          the mailman
                                                          list are
                                                          allowed to
                                                          post.</p>
                                                          </blockquote>
                                                          </div>
                                                          </div>
                                                        </blockquote>
                                                      </div>
                                                    </blockquote>
                                                  </div>
                                                </div>
                                              </blockquote>
                                            </div>
                                          </blockquote>
                                        </div>
                                      </blockquote><div class=""> <br class="webkit-block-placeholder"></div>
                                      <pre class="">_______________________________________________</pre>
                                      <pre class="">Haskell-Cafe mailing list</pre>
                                      <pre class="">To (un)subscribe, modify options or view archives go to:</pre>
                                      <pre class=""><a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhaskell-cafe&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838841547&sdata=vdzv5WBA62cNwO6DA1D4KEHDCweyOerpn1PdMK0A%2BHw%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a></pre>
                                      <pre class="">Only members subscribed via the mailman list are allowed to post.</pre>
                                    </blockquote>
                                  </div><p class="MsoNormal">_______________________________________________<br class="">
                                    Haskell-Cafe mailing list<br class="">
                                    To (un)subscribe, modify options or
                                    view archives go to:<br class="">
                                    <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhaskell-cafe&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838851540&sdata=Btpa3sjfAjTf2ICO0QpQG5vVCawIjERNjUHji06uG5Y%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br class="">
                                    Only members subscribed via the
                                    mailman list are allowed to post.</p>
                                </blockquote>
                              </div><p class="MsoNormal">_______________________________________________<br class="">
                                Haskell-Cafe mailing list<br class="">
                                To (un)subscribe, modify options or view
                                archives go to:<br class="">
                                <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhaskell-cafe&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838851540&sdata=Btpa3sjfAjTf2ICO0QpQG5vVCawIjERNjUHji06uG5Y%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br class="">
                                Only members subscribed via the mailman
                                list are allowed to post.</p>
                            </div>
                          </blockquote>
                        </div><div class=""> <br class="webkit-block-placeholder"></div>
                      </div>
                    </div>
                  </blockquote>
                </div>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </div>

_______________________________________________<br class="">Haskell-Cafe mailing list<br class="">To (un)subscribe, modify options or view archives go to:<br class=""><a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br class="">Only members subscribed via the mailman list are allowed to post.</div></blockquote></div><br class=""></div></body></html>