<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Update: nonmoving GC does make differences<div class=""><br class=""></div><div class="">I think couldn't observe it because I set the heap -H2g rather large, and generation 0 are still collected by old moving GC which having difficulty in handling the large hazard heap. After I realize just now that nonmoving GC only works against oldest generation, I tested it again with `+RTS -H16m -A4m` with and without `-xn`, then:</div><div class=""><br class=""></div><div class="">Without -xn (old moving GC in effect), the throughput degrades fast and stop business progressing at ~200MB of server RSS</div><div class=""><br class=""></div><div class="">With -xn (new nonmvoing GC in effect), server RSS can burst to ~350MB, then throughput degrades relative slower, until RSS reached ~1GB, after then barely progressing at business yielding. But RSS can keep growing with occasional burst fashioned business yield, until ~3.3GB then it totally stuck.</div><div class=""><br class=""></div><div class="">Regards,</div><div class="">Compl</div><div class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On 2020-07-30, at 13:31, Compl Yue via Haskell-Cafe <<a href="mailto:haskell-cafe@haskell.org" class="">haskell-cafe@haskell.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">

  <div class=""><p class="">Thanks Ryan, and I'm honored to get Simon's attention.</p><p class="">I did have some worry about package tskiplist, that its github

      repository seems withdrawn, I emailed the maintainer Peter

      Robinson lately but have gotten no response by far. What

      particularly worrying me is the 1st sentence of the Readme has

      changed from 1.0.0 to 1.0.1 (which is current) as:

    </p><p class="">> - <span style="color: rgb(200, 195, 188); font-family:

        "PT Sans", -apple-system, BlinkMacSystemFont,

        "Segoe UI", Roboto, Oxygen-Sans, Cantarell,

        "Helvetica Neue", sans-serif; font-size: 17px;

        font-style: normal; font-variant-ligatures: normal;

        font-variant-caps: normal; font-weight: 400; letter-spacing:

        0.024px; orphans: 2; text-align: left; text-indent: 0px;

        text-transform: none; white-space: normal; widows: 2;

        word-spacing: 0px; -webkit-text-stroke-width: 0px;

        background-color: rgb(25, 27, 28); text-decoration-style:

        initial; text-decoration-color: initial; display: inline

        !important; float: none;" class="">This package provides an

        implementation of a skip list in STM.</span></p><p class="">>+ <span style="color: rgb(200, 195, 188); font-family:

        "PT Sans", -apple-system, BlinkMacSystemFont,

        "Segoe UI", Roboto, Oxygen-Sans, Cantarell,

        "Helvetica Neue", sans-serif; font-size: 17px;

        font-style: normal; font-variant-ligatures: normal;

        font-variant-caps: normal; font-weight: 400; letter-spacing:

        0.024px; orphans: 2; text-align: left; text-indent: 0px;

        text-transform: none; white-space: normal; widows: 2;

        word-spacing: 0px; -webkit-text-stroke-width: 0px;

        background-color: rgb(25, 27, 28); text-decoration-style:

        initial; text-decoration-color: initial; display: inline

        !important; float: none;" class="">This package provides a

        proof-of-concept implementation of a skip list in STM</span></p><p class="">This has to mean something but I can't figure out yet.<br class="">

    </p><p class="">Dear Peter Robinson, I hope you can see this message and get in

      the loop of discussion. <br class="">

    </p><p class="">Despite that, I don't think overhead of TVar itself the most

      serious issue in my situation, as before GC engagement, there are

      as many TVars being allocated and updated without stuck at

      business progressing. And now I realize what presuring GC in my

      situation is not only the large number of pointers (TVars), and at

      the same time, they form many circular structures, that might be

      nightmare for a GC. As I model my data after graph model, in my

      test workload, there are many FeatureSet instances each being an

      entity/node object, then there are many Feature instances per

      FeatureSet object, each Feature instance being an unary

      relationship/edge object, with a reference attribute (via TVar)

      pointing to the FeatureSet object it belongs to, circular

      structures form because I maintain an index at each FeatureSet

      object, sorted by weight etc., but ultimately pointing back (via

      TVar) to all Feature objects belonging to the set.<br class="">

    </p><p class="">I'm still curious why the new non-moving GC in 8.10.1 still don't

      get obvious business progressing in my situation. I tested it on

      my Mac yesterday and there I don't know how to see how CPU time is

      distributed over threads within a process, I'll further test it

      with some Linux boxes to try understand it better.<br class="">

    </p><p class="">Best regards,</p><p class="">Compl</p><p class=""><br class="">

    </p>

    <div class="moz-cite-prefix">On 2020/7/30 上午10:05, Ryan Yates wrote:<br class="">

    </div>

    <blockquote type="cite" cite="mid:CAO27hRp3vUoq=9EDGLa+Pw9_CQUsdsh3LoarFmq6Xe1CgTJXjQ@mail.gmail.com" class="">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8" class="">

      <div dir="ltr" class="">Simon, I certainly want to help get to the bottom

        of the performance issue at hand :D.  Sorry if my reply was

        misleading.  The constant factor overhead of pushing `TVar`s

        into the internal structure may be pressuring unacceptable GC

        behavior to happen sooner.  My impression was that given the

        same size problem performance loss shifted from synchronization

        to GC.

        <div class=""><br class="">

        </div>

        <div class="">Compl, I'm not aware of mutable heap objects being

          problematic in particular for GHC's GC.  There are lots of

          special cases to handle them of course.  I have successfully

          written Haskell programs that get good performance from the GC

          with the dominant fraction of heap objects being mutable.  I

          looked a little more at `TSkipList` and one tricky aspect of

          an STM based skip list is how to manage randomness.  In

          `TSkipList`'s code there is the following comment:</div>

        <div class="">

          <div class=""><br class="">

          </div>

        </div>

        <div class="">

          <pre style="" class=""><span class="gmail-hs-comment" style="color:rgb(138,138,138)">-- | Returns a randomly chosen level. Used for inserting new elements. /O(1)./</span>

<a name="line-98" moz-do-not-send="true" class=""></a><span class="gmail-hs-comment" style="color:rgb(138,138,138)">-- For performance reasons, this function uses 'unsafePerformIO' to access the</span>

<a name="line-99" moz-do-not-send="true" class=""></a><span class="gmail-hs-comment" style="color:rgb(138,138,138)">-- random number generator. (It would be possible to store the random number</span>

<a name="line-100" moz-do-not-send="true" class=""></a><span class="gmail-hs-comment" style="color:rgb(138,138,138)">-- generator in a 'TVar' and thus be able to access it safely from within the</span>

<a name="line-101" moz-do-not-send="true" class=""></a><span class="gmail-hs-comment" style="color:rgb(138,138,138)">-- STM monad. This, however, might cause high contention among threads.)</span></pre>

          <pre style="" class=""><span class="gmail-hs-comment" style="color:rgb(138,138,138)"><pre style="" class=""><span class="gmail-hs-identifier" style="color:rgb(7,54,66)">chooseLevel</span> <span class="gmail-hs-glyph" style="color:rgb(220,50,47)">::</span> <a href="http://hackage.haskell.org/package/tskiplist-1.0.1/docs/src/Control.Concurrent.STM.TSkipList.Internal.html#TSkipList" class="gmail-" style="text-decoration-line:none;border-bottom:1px solid rgb(238,232,213)" moz-do-not-send="true"><span class="gmail-hs-type gmail-hs-identifier" style="color:rgb(95,95,175)">TSkipList</span></a> <a href="http://hackage.haskell.org/package/tskiplist-1.0.1/docs/src/Control.Concurrent.STM.TSkipList.Internal.html#local-6989586621679028835" class="gmail-" style="text-decoration-line:none;border-bottom:1px solid rgb(238,232,213)" moz-do-not-send="true"><span class="gmail-hs-type gmail-hs-identifier" style="color:rgb(95,95,175)">k</span></a> <a href="http://hackage.haskell.org/package/tskiplist-1.0.1/docs/src/Control.Concurrent.STM.TSkipList.Internal.html#local-6989586621679028836" style="text-decoration-line:none;border-bottom:1px solid rgb(238,232,213)" moz-do-not-send="true" class=""><span class="gmail-hs-type gmail-hs-identifier" style="color:rgb(95,95,175)">a</span></a> <span class="gmail-hs-glyph" style="color:rgb(220,50,47)">-></span> <span class="gmail-hs-type gmail-hs-identifier" style="color:rgb(95,95,175)">Int</span></pre></span></pre>

        </div>

        <div class=""><br class="">

        </div>

        <div class="">This level is chosen on insertion to determine the height

          of the node.  When writing my own STM skiplist I found that

          the details in unsafely accessing randomness had a significant

          impact on performance.  We went with an unboxed array of PCG

          states that had an entry for each capability giving constant

          memory overhead in the number of capabilities.  `TSkipList`

          uses `newStdGen` which involves allocation and

          synchronization.</div>

        <div class=""><br class="">

        </div>

        <div class="">Again, I'm not pointing this out to say that this is the

          entirety of the issue you are encountering, rather, I do think

          the `TSkipList` library could be improved to allocate much

          less.  Others can speak to how to tell where the time is going

          in GC (my knowledge of this is likely out of date).</div>

        <div class=""><br class="">

        </div>

        <div class="">Ryan</div>

        <div class=""><br class="">

        </div>

      </div>

      <br class="">

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Wed, Jul 29, 2020 at 4:57

          PM Simon Peyton Jones <<a href="mailto:simonpj@microsoft.com" moz-do-not-send="true" class="">simonpj@microsoft.com</a>>

          wrote:<br class="">

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div lang="EN-GB" class="">

            <div class="gmail-m_-3608968087787866939WordSection1"><p class="MsoNormal"><span class="">Compl’s problem is (apparently)

                  that execution becomes dominated by GC.  That doesn’t

                  sound like a constant-factor overhead from TVars, no

                  matter how efficient (or otherwise) they are.  It

                  sounds more like a space leak to me; perhaps you need

                  some strict evaluation or something.</span></p><div class=""><span class=""> </span><br class="webkit-block-placeholder"></div><p class="MsoNormal"><span class="">My point is only: before

                  re-engineering STM it would make sense to get a much

                  more detailed insight into what is actually happening,

                  and where the space and time is going.  We have tools

                  to do this (heap profiling, Threadscope, …) but I know

                  they need some skill and insight to use well.  But we

                  don’t have nearly enough insight to draw meaningful

                  conclusions yet.</span></p><div class=""><span class=""> </span><br class="webkit-block-placeholder"></div><p class="MsoNormal"><span class="">Maybe someone with experience

                  of performance debugging might feel able to help

                  Compl?</span></p><div class=""><span class=""> </span><br class="webkit-block-placeholder"></div><p class="MsoNormal"><span class="">Simon</span></p><div class=""><span class=""> </span><br class="webkit-block-placeholder"></div>

              <div style="border-top:none;border-right:none;border-bottom:none;border-left:1.5pt

                solid blue;padding:0cm 0cm 0cm 4pt" class="">

                <div class="">

                  <div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt

                    solid rgb(225,225,225);padding:3pt 0cm 0cm" class=""><p class="MsoNormal"><b class=""><span lang="EN-US" class="">From:</span></b><span lang="EN-US" class=""> Haskell-Cafe <<a href="mailto:haskell-cafe-bounces@haskell.org" target="_blank" moz-do-not-send="true" class="">haskell-cafe-bounces@haskell.org</a>>

                        <b class="">On Behalf Of </b>Ryan Yates<br class="">

                        <b class="">Sent:</b> 29 July 2020 20:41<br class="">

                        <b class="">To:</b> YueCompl <<a href="mailto:compl.yue@icloud.com" target="_blank" moz-do-not-send="true" class="">compl.yue@icloud.com</a>><br class="">

                        <b class="">Cc:</b> Haskell Cafe <<a href="mailto:haskell-cafe@haskell.org" target="_blank" moz-do-not-send="true" class="">haskell-cafe@haskell.org</a>><br class="">

                        <b class="">Subject:</b> Re: [Haskell-cafe] STM friendly

                        TreeMap (or similar with range scan api) ? WAS:

                        Best ways to achieve throughput, for large M:N

                        ratio of STM threads, with hot TVar updates?</span></p>

                  </div>

                </div><div class=""> <br class="webkit-block-placeholder"></div>

                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                    Hi Compl,</p>

                  <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                       <br class="webkit-block-placeholder"></div>

                  </div>

                  <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                      There is a lot of overhead with TVars.  My thesis

                      work addresses this by incorporating mutable

                      constructor fields with STM.  I would like to get

                      all that into GHC as soon as I can :D.  I haven't

                      looked closely at the `tskiplist` package, I'll

                      take a look and see if I see any potential

                      issues.  There was some recent work on concurrent

                      B-tree that may be interesting to try.</p>

                  </div>

                  <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                       <br class="webkit-block-placeholder"></div>

                  </div>

                  <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                      Ryan</p>

                  </div>

                </div><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                   <br class="webkit-block-placeholder"></div>

                <div class="">

                  <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                      On Wed, Jul 29, 2020 at 10:24 AM YueCompl <<a href="mailto:compl.yue@icloud.com" target="_blank" moz-do-not-send="true" class="">compl.yue@icloud.com</a>>

                      wrote:</p>

                  </div>

                  <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt

                    solid rgb(204,204,204);padding:0cm 0cm 0cm

                    6pt;margin-left:4.8pt;margin-right:0cm" class="">

                    <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                        Hi Cafe and Ryan,</p>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          I tried Map/Set from stm-containers and

                          TSkipList (added range scan api against its

                          internal data structure) from <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhackage.haskell.org%2Fpackage%2Ftskiplist&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838761589&sdata=ZOvJVBqJgdGqx2k%2F49fhZeTYkWAd4GRY%2B8ZxH7cyEkI%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://hackage.haskell.org/package/tskiplist</a> ,

                          with them I've got quite improved at

                          scalability on concurrency. </p>

                      </div>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          But unfortunately then I hit another wall at

                          single thread scalability over working memory

                          size, I suspect it's because massively more

                          TVars (those being pointers per se) are

                          introduced by those "contention-free" data

                          structures, they need to mutate separate

                          pointers concurrently in avoiding contentions

                          anyway, but such pointer-intensive heap seems

                          imposing extraordinary pressure to GHC's

                          garbage collector, that GC will dominate CPU

                          utilization with poor business progress. </p>

                      </div>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          For example in my test, I use `+RTS -H2g` for

                          the Haskell server process, so GC is not

                          triggered until after a while, then spin off 3

                          Python client to insert new records

                          concurrently, in the first stage each Python

                          process happily taking ~90% CPU filling

                          (through local mmap) the arrays allocated from

                          the server and logs of success scroll quickly,

                          while the server process utilizes only 30~40%

                          CPU to serve those 3 clients (insert meta data

                          records into unique indices merely); then the

                          client processes' CPU utilization drop

                          drastically once Haskell server process'

                          private memory reached around 2gb, i.e. GC

                          started engaging, the server process's CPU

                          utilization quickly approaches ~300%, while

                          all client processes' drop to 0% for most of

                          the time, and occasionally burst a tiny while

                          with some log output showing progress. And I

                          disable parallel GC lately, enabling parallel

                          GC only makes it worse.</p>

                      </div>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          If I comment out the code updating the indices

                          (those creating many TVars), the overall

                          throughput only drop slowly as more data are

                          inserted, the parallelism feels steady even

                          after the server process' private memory takes

                          several GBs.</p>

                      </div>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          I didn't expect this, but appears to me that

                          GC of GHC is really not good at handling

                          massive number of pointers in the heap, while

                          those pointers are essential to reduce

                          contention (and maybe expensive data copying

                          too) at heavy parallelism/concurrency.</p>

                      </div>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          Btw I tried `+RTS -xn` with GHC 8.10.1 too, no

                          obvious different behavior compared to 8.8.3;

                          and also tried tweaking GC related RTS options

                          a bit, including increasing -G up to 10, no

                          much difference too.</p>

                      </div>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          I feel hopeless at the moment, wondering if

                          I'll have to rewrite this in-memory db in

                          Go/Rust or some other runtime ...</p>

                      </div>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          Btw I read <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftech.channable.com%2Fposts%2F2020-04-07-lessons-in-managing-haskell-memory.html&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838761589&sdata=gqSH82%2FOYRaW4fzBDl%2BLDjhbRA%2BDRE6jaj4k1UI2gFE%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://tech.channable.com/posts/2020-04-07-lessons-in-managing-haskell-memory.html</a> in

                          searching about the symptoms, and don't feel

                          likely to convert my DB managed data into

                          immutable types thus to fit into Compact

                          Regions, not quite likely a live in-mem

                          database instance can do.</p>

                      </div>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          So seems there are good reasons no successful

                          DBMS, at least in-memory ones have been

                          written in Haskell.</p>

                      </div>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          Best regards,</p>

                      </div>

                      <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                          Compl</p>

                      </div>

                      <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                           <br class="webkit-block-placeholder"></div>

                        <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                            <br class="">

                            <br class="">

                          </p>

                          <blockquote style="margin-top:5pt;margin-bottom:5pt" class="">

                            <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                On 2020-07-25, at 22:07, Ryan Yates <<a href="mailto:fryguybob@gmail.com" target="_blank" moz-do-not-send="true" class="">fryguybob@gmail.com</a>>

                                wrote:</p>

                            </div><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                               <br class="webkit-block-placeholder"></div>

                            <div class="">

                              <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                  Unfortunately my STM benchmarks are

                                  rather disorganized.  The most

                                  relevant paper using them is:</p>

                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                                     <br class="webkit-block-placeholder"></div>

                                </div>

                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                    Leveraging hardware TM in Haskell

                                    (PPoPP '19)</p>

                                </div>

                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                    <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdl.acm.org%2Fdoi%2F10.1145%2F3293883.3295711&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838771582&sdata=h3po1gPutR%2BsiCST1N0RNkM6irnVL0%2BVbYl3Vs8F8Oc%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://dl.acm.org/doi/10.1145/3293883.3295711</a></p>

                                </div>

                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                                     <br class="webkit-block-placeholder"></div>

                                </div>

                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                    Or my thesis:</p>

                                </div>

                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                    <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furresearch.rochester.edu%2FinstitutionalPublicationPublicView.action%3FinstitutionalItemId%3D34931&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838771582&sdata=jBQMX5RRajIj0KbLWQCMt%2BMyMJIEmTpSuEHBWpq5Isg%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://urresearch.rochester.edu/institutionalPublicationPublicView.action?institutionalItemId=34931</a> </p>

                                </div>

                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                                     <br class="webkit-block-placeholder"></div>

                                </div>

                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                     The PPoPP benchmarks are on a

                                    branch (or the releases tab on

                                    github):</p>

                                </div>

                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                    <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffryguybob%2Fghc-stm-benchmarks%2Ftree%2Fwip%2Fmutable-fields%2Fbenchmarks%2FPPoPP2019%2Fsrc&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838771582&sdata=PinsrrGPgAB9TgxH61xngSItw1DcIRf1Niq39b%2BOe0s%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://github.com/fryguybob/ghc-stm-benchmarks/tree/wip/mutable-fields/benchmarks/PPoPP2019/src</a> </p>

                                </div>

                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                                     <br class="webkit-block-placeholder"></div>

                                </div>

                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                                     <br class="webkit-block-placeholder"></div>

                                </div>

                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                     All that to say, without an

                                    implementation of mutable

                                    constructor fields (which I'm

                                    working on getting into GHC) the

                                    scaling is limited.</p>

                                </div>

                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                                     <br class="webkit-block-placeholder"></div>

                                </div>

                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                    Ryan</p>

                                </div>

                                <div class=""><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                                     <br class="webkit-block-placeholder"></div>

                                </div>

                              </div><div style="margin-right: 0cm; margin-bottom: 6pt; margin-left: 0cm;" class="">

                                 <br class="webkit-block-placeholder"></div>

                              <div class="">

                                <div class=""><p class="MsoNormal" style="margin-right:0cm;margin-bottom:6pt;margin-left:0cm">

                                    On Sat, Jul 25, 2020 at 3:45 AM

                                    Compl Yue via Haskell-Cafe <<a href="mailto:haskell-cafe@haskell.org" target="_blank" moz-do-not-send="true" class="">haskell-cafe@haskell.org</a>>

                                    wrote:</p>

                                </div>

                                <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt

                                  solid rgb(204,204,204);padding:0cm 0cm

                                  0cm

                                  6pt;margin-left:4.8pt;margin-right:0cm" class="">

                                  <div class=""><p class="">Dear Cafe,</p><p class="">As Chris Allen has suggested, I

                                      learned that  <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhackage.haskell.org%2Fpackage%2Fstm-containers&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838781576&sdata=ZwtAltlFRkny5q7M%2B7Pople6c4WA%2Bs8vZhwewUge7eg%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">

https://hackage.haskell.org/package/stm-containers</a> and <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhackage.haskell.org%2Fpackage%2Fttrie&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838781576&sdata=zMcZy%2BEzqklkQGjKglCgwg5ZoWyWZIyeRNaCcqtnECs%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">

https://hackage.haskell.org/package/ttrie</a> can help a lot when used

                                      in place of traditional HashMap

                                      for stm tx processing, under heavy

                                      concurrency, yet still with

                                      automatic parallelism as GHC

                                      implemented them. Then I realized

                                      that in addition to hash map (used

                                      to implement dicts and scopes), I

                                      also need to find a TreeMap

                                      replacement data structure to

                                      implement the db index. I've been

                                      focusing on the uniqueness

                                      constraint aspect, but it's still

                                      an index, needs to provide range

                                      scan api for db clients, so hash

                                      map is not sufficient for the

                                      index.</p><p class="">I see Ryan shared the code

                                      benchmarking RBTree with stm in

                                      mind:</p><p class=""><a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffryguybob%2Fghc-stm-benchmarks%2Ftree%2Fmaster%2Fbenchmarks%2FRBTree-Throughput&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838791571&sdata=Nl2eN81Kjaf5qyNKEaxxc0ioMw6w4QoX4b5vAE5RaF8%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://github.com/fryguybob/ghc-stm-benchmarks/tree/master/benchmarks/RBTree-Throughput</a>

                                    </p><p class=""><a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffryguybob%2Fghc-stm-benchmarks%2Ftree%2Fmaster%2Fbenchmarks%2FRBTree&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838791571&sdata=%2BLp6HQCyROOlpA2pr8BR8DPls68oY5Y77GKgqbSKmno%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://github.com/fryguybob/ghc-stm-benchmarks/tree/master/benchmarks/RBTree</a></p><p class="">But can't find conclusion or

                                      interpretation of that benchmark

                                      suite. And here's a followup

                                      question:</p><div class=""> <br class="webkit-block-placeholder"></div><p class="">Where are some STM contention

                                      optimized data structures, that

                                      having keys ordered, with

                                      sub-range traversing api ?

                                    </p><p class="">(of course production ready

                                      libraries most desirable)</p><div class=""> <br class="webkit-block-placeholder"></div><p class="">Thanks with regards,</p><p class="">Compl</p><div class=""> <br class="webkit-block-placeholder"></div>

                                    <div class=""><p class="MsoNormal">On 2020/7/25

                                        <span style="font-family:"MS

                                          Gothic"" class="">下午</span>2:04,

                                        Compl Yue via Haskell-Cafe

                                        wrote:</p>

                                    </div>

                                    <blockquote style="margin-top:5pt;margin-bottom:5pt" class=""><p class="">Shame on me for I have neither

                                        experienced with `perf`, I'd

                                        learn these essential tools soon

                                        to put them into good use.</p><p class="">It's great to learn about how

                                        `orElse` actually works, I did

                                        get confused why there are so

                                        little retries captured, and now

                                        I know. So that little trick

                                        should definitely be removed

                                        before going production, as it

                                        does no much useful things at

                                        excessive cost. I put it there

                                        to help me understand internal

                                        working of stm, now I get even

                                        better knowledge ;-)</p><p class="">I think a debugger will trap

                                        every single abort, isn't it

                                        annoying when many aborts would

                                        occur? If I'd like to count the

                                        number of aborts, ideally

                                        accounted per service endpoints,

                                        time periods, source modules

                                        etc. there some tricks for that?</p><p class="">Thanks with best regards,</p><p class="">Compl</p><div class=""> <br class="webkit-block-placeholder"></div>

                                      <div class=""><p class="MsoNormal">On

                                          2020/7/25 <span style="font-family:"MS

                                            Gothic"" class="">上午</span>2:02,

                                          Ryan Yates wrote:</p>

                                      </div>

                                      <blockquote style="margin-top:5pt;margin-bottom:5pt" class="">

                                        <div class=""><p class="MsoNormal">To be

                                            clear, I was trying to refer

                                            to Linux `perf` [^1]. 

                                            Sampling based profiling can

                                            do a good job with

                                            concurrent and parallel

                                            programs where other methods

                                            are problematic.  For

                                            instance,

                                          </p>

                                          <div class=""><p class="MsoNormal"> changing

                                              the size of heap objects

                                              can drastically change

                                              cache performance and

                                              completely different

                                              behavior can show up.</p>

                                          </div>

                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                          </div>

                                          <div class=""><p class="MsoNormal">[^1]: <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPerf_(Linux)&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838801566&sdata=v%2Bv2aVaBITriAM26CqN%2Bp35yshLl%2BbY4BWVEIOSlStA%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://en.wikipedia.org/wiki/Perf_(Linux)</a></p>

                                          </div>

                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                          </div>

                                          <div class=""><p class="MsoNormal">The

                                              spinning in `readTVar`

                                              should always be very

                                              short and it typically

                                              shows up as intensive CPU

                                              use, though it may not be

                                              high energy use with

                                              `pause` in the loop on x86

                                              (looks like we don't have

                                              it [^2], I thought we did,

                                              but maybe that was only in

                                              some of my code... )</p>

                                          </div>

                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                          </div>

                                          <div class=""><p class="MsoNormal">[^2]: <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fghc%2Fghc%2Fblob%2Fmaster%2Frts%2FSTM.c%23L1275&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838801566&sdata=YBLmeg4Xxby%2BJJmO8B5etdA6tDpBYOry7jdjEoRFd%2Fk%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://github.com/ghc/ghc/blob/master/rts/STM.c#L1275</a> </p>

                                          </div>

                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                          </div>

                                          <div class=""><p class="MsoNormal">All

                                              that to say, I doubt that

                                              you are spending much time

                                              spinning (but it would

                                              certainly be interesting

                                              to know if you are!  You

                                              would see `perf` attribute

                                              a large amount of time to

                                              `read_current_value`). 

                                              The amount of code to

                                              execute for commit (the

                                              time when locks are held)

                                              is always much shorter

                                              than it takes to execute

                                              the transaction body.  As

                                              you add more conflicting

                                              threads this gets worse of

                                              course as commits

                                              sequence.</p>

                                          </div>

                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                          </div>

                                          <div class=""><p class="MsoNormal">The

                                              code you have will count

                                              commits of executions of

                                              `retry`.  Note that

                                              `retry` is a user level

                                              idea, that is, you are

                                              counting user level

                                              *explicit* retries.  This

                                              is different from a

                                              transaction failing to

                                              commit and starting

                                              again.  These are

                                              invisible to the user. 

                                              Also using your trace will

                                              convert `retry` from the

                                              efficient wake on write

                                              implementation, to an

                                              active retry that will

                                              always attempt again.  We

                                              don't have cheap logging

                                              of transaction aborts in

                                              GHC, but I have built such

                                              logging in my work.  You

                                              can observe these aborts

                                              with a debugger by looking

                                              for execution of this

                                              line:</p>

                                          </div>

                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                          </div>

                                          <div class=""><p class="MsoNormal"><a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fghc%2Fghc%2Fblob%2Fmaster%2Frts%2FSTM.c%23L1123&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838811560&sdata=jAEm1CpEYQx6ORikerxVHOSlaOmrTzB3m9EVmOwo%2B8w%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://github.com/ghc/ghc/blob/master/rts/STM.c#L1123</a></p>

                                          </div>

                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                          </div>

                                          <div class=""><p class="MsoNormal">Ryan </p>

                                          </div>

                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                          </div>

                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                          </div>

                                        </div><div class=""> <br class="webkit-block-placeholder"></div>

                                        <div class="">

                                          <div class=""><p class="MsoNormal">On Fri,

                                              Jul 24, 2020 at 12:35 PM

                                              Compl Yue <<a href="mailto:compl.yue@icloud.com" target="_blank" moz-do-not-send="true" class="">compl.yue@icloud.com</a>>

                                              wrote:</p>

                                          </div>

                                          <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt

                                            solid

                                            rgb(204,204,204);padding:0cm

                                            0cm 0cm

                                            6pt;margin-left:4.8pt;margin-right:0cm" class="">

                                            <div class=""><p class="">I'm not familiar with

                                                profiling GHC yet, may

                                                need more time to get

                                                myself proficient with

                                                it.</p><p class="">And a bit more details

                                                of my test workload for

                                                diagnostic: the db

                                                clients are Python

                                                processes from a cluster

                                                of worker nodes,

                                                consulting the db server

                                                to register some path

                                                for data files, under a

                                                data dir within a shared

                                                filesystem, then mmap

                                                those data files and

                                                fill in actual array

                                                data. So the db server

                                                don't have much

                                                computation to perform,

                                                but puts the data file

                                                path into a global

                                                index, which at the same

                                                validates its

                                                uniqueness. As there are

                                                many client processes

                                                trying to insert one

                                                meta data record

                                                concurrently, with my

                                                naive implementation,

                                                the global index's TVar

                                                will almost always in

                                                locked state by one

                                                client after another,

                                                from a queue never fall

                                                empty.</p><p class="">So if `readTVar` should

                                                spinning waiting, I

                                                doubt the threads should

                                                actually make high CPU

                                                utilization, because at

                                                any instant of time, all

                                                threads except the

                                                committing one will be

                                                doing that one thing.</p><p class="">And I have something in

                                                my code to track STM

                                                retry like this:</p><p class="">```</p>

                                              <div class="">

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(106,153,85)" class="">-- blocking wait not

                                                      expected, track

                                                      stm retries

                                                      explicitly</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(220,220,170)" class="">trackSTM</span><span style="font-size:13pt;color:rgb(212,212,212)" class="">

                                                      ::

                                                    </span><span style="font-size:13pt;color:rgb(86,156,214)" class="">Int</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""> ->

                                                    </span><span style="font-size:13pt;color:rgb(86,156,214)" class="">IO</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""> (</span><span style="font-size:13pt;color:rgb(86,156,214)" class="">Either</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""> ()

                                                    </span><span style="font-size:13pt;color:rgb(156,220,254)" class="">a</span><span style="font-size:13pt;color:rgb(212,212,212)" class="">)</span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">trackSTM !rtc =

                                                    </span><span style="font-size:13pt;color:rgb(197,134,192)" class="">do</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">when

                                                    </span><span style="font-size:13pt;color:rgb(106,153,85)" class="">--

                                                      todo increase the

                                                      threshold of

                                                      reporting?</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">(rtc >

                                                    </span><span style="font-size:13pt;color:rgb(181,206,168)" class="">0</span><span style="font-size:13pt;color:rgb(212,212,212)" class="">) $

                                                    </span><span style="font-size:13pt;color:rgb(197,134,192)" class="">do</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(106,153,85)" class="">-- trace out the retries so

                                                      the end users can

                                                      be aware of them</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">tid <- myThreadId</span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">trace</span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">(

                                                    </span><span style="font-size:13pt;color:rgb(206,145,120)" class="">"</span><span style="font-size:13pt;font-family:"Segoe UI

                                                      Emoji",sans-serif;color:rgb(206,145,120)" class="">🔙</span><span style="font-size:13pt;color:rgb(215,186,125)" class="">\n</span><span style="font-size:13pt;color:rgb(206,145,120)" class="">"</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class=""><> show callCtx</span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class=""><>

                                                    </span><span style="font-size:13pt;color:rgb(206,145,120)" class="">"</span><span style="font-size:13pt;font-family:"Segoe UI

                                                      Emoji",sans-serif;color:rgb(206,145,120)" class="">🌀</span><span style="font-size:13pt;color:rgb(206,145,120)" class=""> "</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class=""><> show tid</span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class=""><>

                                                    </span><span style="font-size:13pt;color:rgb(206,145,120)" class="">"

                                                      stm retry #"</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class=""><> show rtc</span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">)</span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">$ return

                                                    </span><span style="font-size:13pt;color:rgb(86,156,214)" class="">()</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">atomically ((Just

                                                      <$> stmJob)

                                                      `orElse` return

                                                      Nothing) >>=

                                                      \</span><span style="font-size:13pt;color:rgb(197,134,192)" class="">case</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">Nothing ->

                                                    </span><span style="font-size:13pt;color:rgb(106,153,85)" class="">--

                                                      stm failed, do a

                                                      tracked retry</span><span style="font-size:13pt;color:rgb(212,212,212)" class=""></span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">trackSTM (rtc +

                                                    </span><span style="font-size:13pt;color:rgb(181,206,168)" class="">1</span><span style="font-size:13pt;color:rgb(212,212,212)" class="">)</span></p>

                                                </div>

                                                <div class=""><p class="MsoNormal" style="line-height:17.25pt;background:rgb(30,30,30)"><span style="font-size:13pt;color:rgb(212,212,212)" class="">Just ... -> ...</span></p>

                                                </div>

                                              </div><p class="">```</p><p class="">No such trace msg fires

                                                during my test, neither

                                                in single thread run,

                                                nor in runs with

                                                pressure. I'm sure this

                                                tracing mechanism works,

                                                as I can see such traces

                                                fire, in case e.g.

                                                posting a TMVar to a

                                                TQueue for some other

                                                thread to fill it, then

                                                read the result out, if

                                                these 2 ops are composed

                                                into a single tx, then

                                                of course it's infinite

                                                retry loop, and a

                                                sequence of such msgs

                                                are logged with ever

                                                increasing rtc #.</p><p class="">So I believe no retry

                                                has ever been triggered.</p><p class="">What can going on

                                                there?</p><div class=""> <br class="webkit-block-placeholder"></div>

                                              <div class=""><p class="MsoNormal">On

                                                  2020/7/24 <span style="font-family:"MS

                                                    Gothic"" class="">下午</span>11:46,

                                                  Ryan Yates wrote:</p>

                                              </div>

                                              <blockquote style="margin-top:5pt;margin-bottom:5pt" class="">

                                                <div class="">

                                                  <div class="">

                                                    <div class=""><p class="MsoNormal">>

                                                        Then to explain

                                                        the low CPU

                                                        utilization

                                                        (~10%), am I

                                                        right to

                                                        understand it as

                                                        that upon

                                                        reading a TVar

                                                        locked by

                                                        another

                                                        committing tx, a

                                                        lightweight

                                                        thread will put

                                                        itself into

                                                        `waiting STM`

                                                        and descheduled

                                                        state, so the

                                                        CPUs can only

                                                        stay idle as not

                                                        so many threads

                                                        are willing to

                                                        proceed?</p>

                                                    </div>

                                                    <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                    </div>

                                                    <div class=""><p class="MsoNormal">Since

                                                        the commit

                                                        happens in

                                                        finite steps,

                                                        the expectation

                                                        is that the lock

                                                        will be released

                                                        very soon. 

                                                        Given this when

                                                        the body of a

                                                        transaction

                                                        executes

                                                        `readTVar` it

                                                        spins (active

                                                        CPU!) until the

                                                        `TVar` is

                                                        observed

                                                        unlocked.  If a

                                                        lock is observed

                                                        while commiting,

                                                        it immediately

                                                        starts the

                                                        transaction

                                                        again from the

                                                        beginning.  To

                                                        get the behavior

                                                        of suspending a

                                                        transaction you

                                                        have to

                                                        successfully

                                                        commit a

                                                        transaction that

                                                        executed

                                                        `retry`.  Then

                                                        the transaction

                                                        is put on the

                                                        wakeup lists of

                                                        its read set and

                                                        subsequent

                                                        commits will

                                                        wake it up if

                                                        its write set

                                                        overlaps.</p>

                                                    </div>

                                                    <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                    </div>

                                                    <div class=""><p class="MsoNormal">I

                                                        don't think any

                                                        of these things

                                                        would explain

                                                        low CPU

                                                        utilization. 

                                                        You could try

                                                        running with

                                                        `perf` and see

                                                        if lots of time

                                                        is spent in some

                                                        recognizable

                                                        part of the RTS.</p>

                                                    </div>

                                                    <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                    </div>

                                                    <div class=""><p class="MsoNormal">Ryan</p>

                                                    </div>

                                                  </div>

                                                  <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                  </div><div class=""> <br class="webkit-block-placeholder"></div>

                                                  <div class="">

                                                    <div class=""><p class="MsoNormal">On

                                                        Fri, Jul 24,

                                                        2020 at 11:22 AM

                                                        Compl Yue <<a href="mailto:compl.yue@icloud.com" target="_blank" moz-do-not-send="true" class="">compl.yue@icloud.com</a>>

                                                        wrote:</p>

                                                    </div>

                                                    <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt

                                                      solid

                                                      rgb(204,204,204);padding:0cm

                                                      0cm 0cm

                                                      6pt;margin-left:4.8pt;margin-right:0cm" class="">

                                                      <div class=""><p class="">Thanks very

                                                          much for the

                                                          insightful

                                                          information

                                                          Ryan! I'm glad

                                                          my suspect was

                                                          wrong about

                                                          the Haskell

                                                          scheduler:</p><p class="">> The

                                                          Haskell

                                                          capability

                                                          that is

                                                          committing a

                                                          transaction

                                                          will not yield

                                                          to another

                                                          Haskell thread

                                                          while it is

                                                          doing the

                                                          commit.  The

                                                          OS thread may

                                                          be

                                                          preempted, but

                                                          once commit

                                                          starts the

                                                          haskell

                                                          scheduler is

                                                          not invoked

                                                          until after

                                                          locks are

                                                          released.</p>

                                                        <div class=""><p class="MsoNormal">So

                                                          best effort

                                                          had already

                                                          been made in

                                                          GHC and I just

                                                          need to

                                                          cooperate

                                                          better with

                                                          its design.

                                                          Then to

                                                          explain the

                                                          low CPU

                                                          utilization

                                                          (~10%), am I

                                                          right to

                                                          understand it

                                                          as that upon

                                                          reading a TVar

                                                          locked by

                                                          another

                                                          committing tx,

                                                          a lightweight

                                                          thread will

                                                          put itself

                                                          into `waiting

                                                          STM` and

                                                          descheduled

                                                          state, so the

                                                          CPUs can only

                                                          stay idle as

                                                          not so many

                                                          threads are

                                                          willing to

                                                          proceed?</p>

                                                        </div>

                                                        <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                        </div>

                                                        <div class=""><p class="MsoNormal">Anyway,

                                                          I see light

                                                          with better

                                                          data

                                                          structures to

                                                          improve my

                                                          situation, let

                                                          me try them

                                                          and report

                                                          back. Actually

                                                          I later

                                                          changed `TVar

                                                          (HaskMap k v)`

                                                          to be `TVar

                                                          (HashMap k

                                                          Int)` where

                                                          the `Int`

                                                          being array

                                                          index into

                                                          `TVar (Vector

                                                          (TVar (Maybe

                                                          v)))`, in

                                                          pursuing

                                                          insertion

                                                          order

                                                          preservation

                                                          semantic of

                                                          dict entries

                                                          (like that in

                                                          Python 3.7+),

                                                          then it's very

                                                          hopeful to

                                                          incorporate

                                                          stm-containers'

                                                          Map or ttrie

                                                          to approach

                                                          free of

                                                          contention.</p>

                                                        </div><p class="">Thanks with

                                                          regards,</p><p class="">Compl</p><div class=""> <br class="webkit-block-placeholder"></div>

                                                        <div class=""><p class="MsoNormal">On

                                                          2020/7/24 <span style="font-family:"MS Gothic"" class="">下午</span>10:03, Ryan Yates

                                                          wrote:</p>

                                                        </div>

                                                        <blockquote style="margin-top:5pt;margin-bottom:5pt" class="">

                                                          <div class="">

                                                          <div class=""><p class="MsoNormal">Hi

                                                          Compl,</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">Having

                                                          a pool of

                                                          transaction processing

                                                          threads can be

                                                          helpful in a

                                                          certain way. 

                                                          If the body of

                                                          the

                                                          transaction

                                                          takes more

                                                          time to

                                                          execute then

                                                          the Haskell

                                                          thread is

                                                          allowed and it

                                                          yields, the

                                                          suspended thread

                                                          won't get in

                                                          the way of

                                                          other thread,

                                                          but when it is

                                                          rescheduled,

                                                          will have a

                                                          low

                                                          probability of

                                                          success.  Even

                                                          worse, it will

                                                          probably not

                                                          discover that

                                                          it is doomed

                                                          to failure

                                                          until commit

                                                          time.  If

                                                          transactions

                                                          are more

                                                          likely to

                                                          reach commit

                                                          without

                                                          yielding, they

                                                          are more

                                                          likely to

                                                          succeed.  If

                                                          the

                                                          transactions

                                                          are not

                                                          conflicting,

                                                          it doesn't

                                                          make much

                                                          difference

                                                          other than

                                                          cache churn.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">The

                                                          Haskell

                                                          capability

                                                          that is

                                                          committing a

                                                          transaction

                                                          will not yield

                                                          to another

                                                          Haskell thread

                                                          while it is

                                                          doing the

                                                          commit.  The

                                                          OS thread may

                                                          be

                                                          preempted, but

                                                          once commit

                                                          starts the

                                                          haskell

                                                          scheduler is

                                                          not invoked

                                                          until after

                                                          locks are

                                                          released.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">To

                                                          get good

                                                          performance

                                                          from STM you

                                                          must pay

                                                          attention to

                                                          what TVars are

                                                          involved in a

                                                          commit.  All

                                                          STM systems

                                                          are working

                                                          under the

                                                          assumption of

                                                          low

                                                          contention, so

                                                          you want to

                                                          minimize

                                                          "false"

                                                          conflicts

                                                          (conflicts

                                                          that are not

                                                          essential to

                                                          the

                                                          computation). 

                                                            Something

                                                          like `TVar

                                                          (HashMap k v)`

                                                          will work

                                                          pretty well

                                                          for a low

                                                          thread count,

                                                          but every

                                                          transaction

                                                          that writes to

                                                          that structure

                                                          will be in

                                                          conflict with

                                                          every other

                                                          transaction

                                                          that accesses

                                                          it.  Pushing

                                                          the `TVar`

                                                          into the nodes

                                                          of the

                                                          structure

                                                          reduces the

                                                          possibilities

                                                          for conflict,

                                                          while

                                                          increasing the

                                                          amount of

                                                          bookkeeping

                                                          STM has to

                                                          do.  I would

                                                          like to reduce

                                                          the cost of

                                                          that

                                                          bookkeeping

                                                          using better

                                                          structures,

                                                          but we need to

                                                          do so without

                                                          harming

                                                          performance in

                                                          the low TVar

                                                          count case. 

                                                          Right now it

                                                          is optimized

                                                          for good cache

                                                          performance

                                                          with a handful

                                                          of TVars.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">There

                                                          is another way

                                                          to play with

                                                          performance by

                                                          moving work

                                                          into and out

                                                          of the

                                                          transaction

                                                          body.  A

                                                          transaction

                                                          body that

                                                          executes

                                                          quickly will

                                                          reach commit

                                                          faster.  But

                                                          it may be

                                                          delaying work

                                                          that moves

                                                          into another

                                                          transaction. 

                                                          Forcing values

                                                          at the right

                                                          time can make

                                                          a big

                                                          difference.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">Ryan</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">On

                                                          Fri, Jul 24,

                                                          2020 at 2:14

                                                          AM Compl Yue

                                                          via

                                                          Haskell-Cafe

                                                          <<a href="mailto:haskell-cafe@haskell.org" target="_blank" moz-do-not-send="true" class="">haskell-cafe@haskell.org</a>>

                                                          wrote:</p>

                                                          </div>

                                                          <div class="">

                                                          <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt

                                                          solid

                                                          rgb(204,204,204);padding:0cm

                                                          0cm 0cm

                                                          6pt;margin-left:4.8pt;margin-right:0cm" class="">

                                                          <div class=""><p class="">Thanks

                                                          Chris, I

                                                          confess I

                                                          didn't pay

                                                          enough

                                                          attention to

                                                          STM

                                                          specialized

                                                          container

                                                          libraries by

                                                          far, I skimmed

                                                          through the

                                                          description of

                                                          stm-containers

                                                          and ttrie, and

                                                          feel they

                                                          would

                                                          definitely

                                                          improve my

                                                          code's

                                                          performance in

                                                          case I limit

                                                          the server's

                                                          parallelism

                                                          within

                                                          hardware

                                                          capabilities.

                                                          That may

                                                          because I'm

                                                          still

                                                          prototyping

                                                          the api and

                                                          infrastructure

                                                          for

                                                          correctness,

                                                          so even `TVar

                                                          (HashMap k v)`

                                                          performs okay

                                                          for me at the

                                                          moment, only

                                                          if at low

                                                          contention

                                                          (surely

                                                          there're

                                                          plenty of CPU

                                                          cycles to be

                                                          optimized out

                                                          in next

                                                          steps). I

                                                          model my data

                                                          after graph

                                                          model, so most

                                                          data, even

                                                          most indices

                                                          are localized

                                                          to nodes and

                                                          edges, those

                                                          can be

                                                          manipulated

                                                          without

                                                          conflict,

                                                          that's why I

                                                          assumed I have

                                                          a low

                                                          contention use

                                                          case since the

                                                          very beginning

                                                          - until I

                                                          found there

                                                          are still

                                                          (though minor)

                                                          needs for

                                                          global indices

                                                          to guarantee

                                                          global

                                                          uniqueness, I

                                                          feel faithful

                                                          with

                                                          stm-containers/ttrie

                                                          to implement a

                                                          more scalable

                                                          global index

                                                          data

                                                          structure,

                                                          thanks for

                                                          hinting me.</p><p class="">So an

                                                          evident

                                                          solution comes

                                                          into my mind

                                                          now, is to run

                                                          the server

                                                          with a pool of

                                                          tx processing

                                                          threads,

                                                          matching

                                                          number of CPU

                                                          cores, client

                                                          RPC requests

                                                          then get

                                                          queued to be

                                                          executed in

                                                          some thread

                                                          from the pool.

                                                          But I'm really

                                                          fond of the

                                                          mechanism of

                                                          M:N scheduler

                                                          which solves

                                                          massive/dynamic

                                                          concurrency so

                                                          elegantly. I

                                                          had some good

                                                          result with Go

                                                          in this

                                                          regard, and

                                                          see GHC at par

                                                          in doing this,

                                                          I don't want

                                                          to give up

                                                          this enjoyable

                                                          machinery.</p><p class="">But looked

                                                          at the stm

                                                          implementation

                                                          in GHC, it

                                                          seems written

                                                          TVars are

                                                          exclusively

                                                          locked during

                                                          commit of a

                                                          tx, I suspect

                                                          this is the

                                                          culprit when

                                                          there're large

                                                          M lightweight

                                                          threads

                                                          scheduled upon

                                                          a small N

                                                          hardware

                                                          capabilities,

                                                          that is when a

                                                          lightweight

                                                          thread yield

                                                          control during

                                                          an stm

                                                          transaction

                                                          commit, the

                                                          TVars it

                                                          locked will

                                                          stay so until

                                                          it's scheduled

                                                          again (and

                                                          again) till it

                                                          can finish the

                                                          commit. This

                                                          way,

                                                          descheduled

                                                          threads could

                                                          hold live

                                                          threads from

                                                          progressing. I

                                                          haven't gone

                                                          into more

                                                          details there,

                                                          but wonder if

                                                          there can be

                                                          some

                                                          improvement

                                                          for GHC RTS to

                                                          keep an stm

                                                          committing

                                                          thread from

                                                          descheduled,

                                                          but seemingly

                                                          that may

                                                          impose more

                                                          starvation

                                                          potential; or

                                                          stm can be

                                                          improved to

                                                          have its TVar

                                                          locks

                                                          preemptable

                                                          when the owner

                                                          trec/thread is

                                                          in descheduled

                                                          state? Neither

                                                          should be easy

                                                          but I'd really

                                                          love massive

                                                          lightweight

                                                          threads doing

                                                          STM

                                                          practically

                                                          well.</p><p class="">Best

                                                          regards,</p><p class="">Compl</p><div class=""> <br class="webkit-block-placeholder"></div>

                                                          <div class=""><p class="MsoNormal">On

                                                          2020/7/24 <span style="font-family:"MS Gothic"" class="">上午</span>12:57, Christopher

                                                          Allen wrote:</p>

                                                          </div>

                                                          <blockquote style="margin-top:5pt;margin-bottom:5pt" class="">

                                                          <div class=""><p class="MsoNormal">It

                                                          seems like you

                                                          know how to

                                                          run practical

                                                          tests for

                                                          tuning thread

                                                          count and

                                                          contention for

                                                          throughput.

                                                          Part of the

                                                          reason you

                                                          haven't gotten

                                                          a super clear

                                                          answer is "it

                                                          depends." You

                                                          give up

                                                          fairness when

                                                          you use STM

                                                          instead of

                                                          MVars or

                                                          equivalent

                                                          structures.

                                                          That means a

                                                          long running

                                                          transaction

                                                          might get

                                                          stampeded by

                                                          many small

                                                          ones

                                                          invalidating

                                                          it over and

                                                          over. The

                                                          long-running

                                                          transaction

                                                          might never

                                                          clear if

                                                          the small

                                                          transactions

                                                          keep moving

                                                          the cheese. I

                                                          mention this

                                                          because

                                                          transaction

                                                          runtime and

                                                          size and count

                                                          all affect

                                                          throughput and

                                                          latency. What

                                                          might be ideal

                                                          for one

                                                          pattern of

                                                          work might not

                                                          be ideal for

                                                          another.

                                                          Optimizing for

                                                          overall

                                                          throughput

                                                          might make the

                                                          contention and

                                                          fairness

                                                          problems worse

                                                          too. I've done

                                                          practical

                                                          tests to

                                                          optimize this

                                                          in the past,

                                                          both for STM

                                                          in Haskell and

                                                          for RDBMS

                                                          workloads. </p>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">The

                                                          next step is

                                                          sometimes

                                                          figuring out

                                                          whether you

                                                          really need a

                                                          data structure

                                                          within a

                                                          single STM

                                                          container or

                                                          if perhaps you

                                                          can break up

                                                          your STM

                                                          container

                                                          boundaries

                                                          into zones or

                                                          regions that

                                                          roughly map

                                                          onto update

                                                          boundaries.

                                                          That should

                                                          make the

                                                          transactions

                                                          churn less. On

                                                          the outside

                                                          chance you do

                                                          need to touch

                                                          more than one

                                                          container in a

                                                          transaction,

                                                          well, they

                                                          compose.

                                                          </p>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">e.g. <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhackage.haskell.org%2Fpackage%2Fstm-containers&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838811560&sdata=Lq1%2BGj0Z6%2BBGMRAZrSzcTAlYgj0B0A67RaQcyyCcXbk%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://hackage.haskell.org/package/stm-containers</a></p>

                                                          </div>

                                                          <div class=""><p class="MsoNormal"><a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhackage.haskell.org%2Fpackage%2Fttrie&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838821555&sdata=PpaiVM2NrPM2HzK0bh%2BMR8YF90yHlxKnN9gwZVQHqR0%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://hackage.haskell.org/package/ttrie</a></p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">It

                                                          also sounds a

                                                          bit like

                                                          your question

                                                          bumps into

                                                          Amdahl's Law a

                                                          bit.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">All

                                                          else fails,

                                                          stop using STM

                                                          and find

                                                          something more

                                                          tuned to your

                                                          problem space.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">Hope

                                                          this helps,</p>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">Chris

                                                          Allen</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          </div>

                                                          </div><div class=""> <br class="webkit-block-placeholder"></div>

                                                          <div class="">

                                                          <div class=""><p class="MsoNormal">On

                                                          Thu, Jul 23,

                                                          2020 at 9:53

                                                          AM YueCompl

                                                          via

                                                          Haskell-Cafe

                                                          <<a href="mailto:haskell-cafe@haskell.org" target="_blank" moz-do-not-send="true" class="">haskell-cafe@haskell.org</a>>

                                                          wrote:</p>

                                                          </div>

                                                          <blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt

                                                          solid

                                                          rgb(204,204,204);padding:0cm

                                                          0cm 0cm

                                                          6pt;margin-left:4.8pt;margin-right:0cm" class="">

                                                          <div class=""><p class="MsoNormal">Hello

                                                          Cafe, </p>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">I'm

                                                          working on an

                                                          in-memory

                                                          database, in

                                                          Client/Server

                                                          mode I just

                                                          let each

                                                          connected

                                                          client submit

                                                          remote

                                                          procedure call

                                                          running in its

                                                          dedicated

                                                          lightweight

                                                          thread,

                                                          modifying

                                                          TVars in RAM

                                                          per its

                                                          business

                                                          needs, then in

                                                          case many

                                                          clients

                                                          connected

                                                          concurrently

                                                          and trying to

                                                          insert new

                                                          data, if they

                                                          are triggering

                                                          global index

                                                          (some TVar)

                                                          update, the

                                                          throughput

                                                          would drop

                                                          drastically. I

                                                          reduced the

                                                          shared state

                                                          to a simple

                                                          int counter by

                                                          TVar, got same

                                                          symptom. While

                                                          the

                                                          parallelism

                                                          feels okay

                                                          when there's

                                                          no hot TVar

                                                          conflicting,

                                                          or M is not

                                                          much greater

                                                          than N.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">As

                                                          an empirical

                                                          test workload,

                                                          I have a `+RTS

                                                          -N10` server

                                                          process, it

                                                          handles 10

                                                          concurrent

                                                          clients okay,

                                                          got ~5x of

                                                          single thread

                                                          throughput;

                                                          but in

                                                          handling 20

                                                          concurrent

                                                          clients, each

                                                          of the 10 CPUs

                                                          can only be

                                                          driven to ~10%

                                                          utilization,

                                                          the throughput

                                                          seems even

                                                          worse than

                                                          single thread.

                                                          More clients

                                                          can even drive

                                                          it thrashing

                                                          without much

                                                           progressing.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal"> I

                                                          can understand

                                                          that pure STM

                                                          doesn't scale

                                                          well after

                                                          reading [1],

                                                          and I see it

                                                          suggested [7]

                                                          attractive and

                                                          planned future

                                                          work toward

                                                          that

                                                          direction.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">But

                                                          I can't find

                                                          certain

                                                          libraries or

                                                          frameworks

                                                          addressing

                                                          large M over

                                                          small N

                                                          scenarios, [1]

                                                          experimented

                                                          with

                                                          designated N

                                                          parallelism,

                                                          and [7] is

                                                          rather

                                                          theoretical to

                                                          my empirical

                                                          needs.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">Can

                                                          you direct me

                                                          to some

                                                          available

                                                          library

                                                          implementing

                                                          the

                                                          methodology

                                                          proposed in

                                                          [7] or other

                                                          ways tackling

                                                          this problem?</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">I

                                                          think the most

                                                          difficult one

                                                          is that a

                                                          transaction

                                                          should commit

                                                          with global

                                                          indices (with

                                                          possibly

                                                          unique

                                                          constraints)

                                                          atomically

                                                          updated, and

                                                          rollback with

                                                          any violation

                                                          of

                                                          constraints,

                                                          i.e.

                                                          transactions

                                                          have to cover

                                                          global states

                                                          like indices.

                                                          Other problems

                                                          seem more

                                                          trivial than

                                                          this.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">Specifically,

                                                          [7] states:</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">> It

                                                          must be

                                                          emphasized

                                                          that all of

                                                          the mechanisms

                                                          we deploy

                                                          originate, in

                                                          one form or

                                                          another, in

                                                          the database

                                                          literature

                                                          from the 70s

                                                          and 80s. Our

                                                          contribution

                                                          is to adapt

                                                          these

                                                          techniques to

                                                          software

                                                          transactional

                                                          memory,

                                                          providing more

                                                          effective

                                                          solutions to

                                                          important STM

                                                          problems than

                                                          prior

                                                          proposals.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">I

                                                          wonder any STM

                                                          based library

                                                          has simplified

                                                          those

                                                          techniques to

                                                          be composed

                                                          right away? I

                                                          don't really

                                                          want to

                                                          implement

                                                          those

                                                          mechanisms by

                                                          myself,

                                                          rebuilding

                                                          many wheels

                                                          from scratch.</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">Best

                                                          regards,</p>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">Compl</p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">[1] Comparing

                                                          the

                                                          performance of

                                                          concurrent

                                                          linked-list

                                                          implementations

                                                          in Haskell </p>

                                                          </div>

                                                          <div class=""><p class="MsoNormal"><a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsimonmar.github.io%2Fbib%2Fpapers%2Fconcurrent-data.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838821555&sdata=41Jaz8ZRmRfBHyGKxfhJlm4xR7q0pOtJShtO0jTlOwQ%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://simonmar.github.io/bib/papers/concurrent-data.pdf</a></p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          <div class=""><p class="MsoNormal">[7]

                                                          M. Herlihy and

                                                          E. Koskinen.

                                                          Transactional

                                                          boosting: a

                                                          methodology

                                                          for

                                                          highly-concurrent

                                                          transactional

                                                          objects. In

                                                          Proc. of PPoPP

                                                          ’08, pages

                                                          207–216. ACM

                                                          Press, 2008.</p>

                                                          </div>

                                                          <div class=""><p class="MsoNormal"><a href="https://nam06.safelinks.protection.outlook.com/?url=https:%2F%2Fwww.cs.stevens.edu%2F~ejk%2Fpapers%2Fboosting-ppopp08.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838831548&sdata=ya8Az1oC6f2xoMb90S9HCH57UTQ0nV9sg6SW%2B5JCPC4%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">https://www.cs.stevens.edu/~ejk/papers/boosting-ppopp08.pdf</a></p>

                                                          </div>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div>

                                                          </div><p class="MsoNormal">_______________________________________________<br class="">

                                                          Haskell-Cafe

                                                          mailing list<br class="">

                                                          To

                                                          (un)subscribe,

                                                          modify options

                                                          or view

                                                          archives go

                                                          to:<br class="">

                                                          <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhaskell-cafe&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838831548&sdata=c2AV7CO42o3tcw0EuMzqedKkBCtQjWjvdMoUsb4llbY%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br class="">

                                                          Only members

                                                          subscribed via

                                                          the mailman

                                                          list are

                                                          allowed to

                                                          post.</p>

                                                          </blockquote>

                                                          </div><p class="MsoNormal"><br clear="all" class="">

                                                          </p>

                                                          <div class=""><div class=""> <br class="webkit-block-placeholder"></div>

                                                          </div><p class="MsoNormal">--

                                                          </p>

                                                          <div class="">

                                                          <div class="">

                                                          <div class="">

                                                          <div class="">

                                                          <div class=""><p class="MsoNormal">Chris

                                                          Allen</p>

                                                          <div class=""><p class="MsoNormal"><span style="font-size:9.5pt" class="">Currently working on </span><a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhaskellbook.com%2F&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838831548&sdata=tIHFQFZPIQgRp8oqGRvyebm1YQdCvGD0VoMcflzJwKc%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://haskellbook.com</a></p>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </blockquote>

                                                          </div><p class="MsoNormal">_______________________________________________<br class="">

                                                          Haskell-Cafe

                                                          mailing list<br class="">

                                                          To

                                                          (un)subscribe,

                                                          modify options

                                                          or view

                                                          archives go

                                                          to:<br class="">

                                                          <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhaskell-cafe&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838841547&sdata=vdzv5WBA62cNwO6DA1D4KEHDCweyOerpn1PdMK0A%2BHw%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br class="">

                                                          Only members

                                                          subscribed via

                                                          the mailman

                                                          list are

                                                          allowed to

                                                          post.</p>

                                                          </blockquote>

                                                          </div>

                                                          </div>

                                                        </blockquote>

                                                      </div>

                                                    </blockquote>

                                                  </div>

                                                </div>

                                              </blockquote>

                                            </div>

                                          </blockquote>

                                        </div>

                                      </blockquote><div class=""> <br class="webkit-block-placeholder"></div>

                                      <pre class="">_______________________________________________</pre>

                                      <pre class="">Haskell-Cafe mailing list</pre>

                                      <pre class="">To (un)subscribe, modify options or view archives go to:</pre>

                                      <pre class=""><a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhaskell-cafe&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838841547&sdata=vdzv5WBA62cNwO6DA1D4KEHDCweyOerpn1PdMK0A%2BHw%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a></pre>

                                      <pre class="">Only members subscribed via the mailman list are allowed to post.</pre>

                                    </blockquote>

                                  </div><p class="MsoNormal">_______________________________________________<br class="">

                                    Haskell-Cafe mailing list<br class="">

                                    To (un)subscribe, modify options or

                                    view archives go to:<br class="">

                                    <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhaskell-cafe&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838851540&sdata=Btpa3sjfAjTf2ICO0QpQG5vVCawIjERNjUHji06uG5Y%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br class="">

                                    Only members subscribed via the

                                    mailman list are allowed to post.</p>

                                </blockquote>

                              </div><p class="MsoNormal">_______________________________________________<br class="">

                                Haskell-Cafe mailing list<br class="">

                                To (un)subscribe, modify options or view

                                archives go to:<br class="">

                                <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fhaskell-cafe&data=02%7C01%7Csimonpj%40microsoft.com%7C8ebd68bca55140cebaae08d833f888f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316489838851540&sdata=Btpa3sjfAjTf2ICO0QpQG5vVCawIjERNjUHji06uG5Y%3D&reserved=0" target="_blank" moz-do-not-send="true" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br class="">

                                Only members subscribed via the mailman

                                list are allowed to post.</p>

                            </div>

                          </blockquote>

                        </div><div class=""> <br class="webkit-block-placeholder"></div>

                      </div>

                    </div>

                  </blockquote>

                </div>

              </div>

            </div>

          </div>

        </blockquote>

      </div>

    </blockquote>

  </div>

_______________________________________________<br class="">Haskell-Cafe mailing list<br class="">To (un)subscribe, modify options or view archives go to:<br class=""><a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" class="">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br class="">Only members subscribed via the mailman list are allowed to post.</div></blockquote></div><br class=""></div></body></html>