<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Thanks Chris, I confess I didn't pay enough attention to STM

      specialized container libraries by far, I skimmed through the

      description of stm-containers and ttrie, and feel they would

      definitely improve my code's performance in case I limit the

      server's parallelism within hardware capabilities. That may

      because I'm still prototyping the api and infrastructure for

      correctness, so even `TVar (HashMap k v)` performs okay for me at

      the moment, only if at low contention (surely there're plenty of

      CPU cycles to be optimized out in next steps). I model my data

      after graph model, so most data, even most indices are localized

      to nodes and edges, those can be manipulated without conflict,

      that's why I assumed I have a low contention use case since the

      very beginning - until I found there are still (though minor)

      needs for global indices to guarantee global uniqueness, I feel

      faithful with stm-containers/ttrie to implement a more scalable

      global index data structure, thanks for hinting me.</p>

    <p>So an evident solution comes into my mind now, is to run the

      server with a pool of tx processing threads, matching number of

      CPU cores, client RPC requests then get queued to be executed in

      some thread from the pool. But I'm really fond of the mechanism of

      M:N scheduler which solves massive/dynamic concurrency so

      elegantly. I had some good result with Go in this regard, and see

      GHC at par in doing this, I don't want to give up this enjoyable

      machinery.</p>

    <p>But looked at the stm implementation in GHC, it seems written

      TVars are exclusively locked during commit of a tx, I suspect this

      is the culprit when there're large M lightweight threads scheduled

      upon a small N hardware capabilities, that is when a lightweight

      thread yield control during an stm transaction commit, the TVars

      it locked will stay so until it's scheduled again (and again) till

      it can finish the commit. This way, descheduled threads could hold

      live threads from progressing. I haven't gone into more details

      there, but wonder if there can be some improvement for GHC RTS to

      keep an stm committing thread from descheduled, but seemingly that

      may impose more starvation potential; or stm can be improved to

      have its TVar locks preemptable when the owner trec/thread is in

      descheduled state? Neither should be easy but I'd really love

      massive lightweight threads doing STM practically well.<br>

    </p>

    <p>Best regards,</p>

    <p>Compl</p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 2020/7/24 上午12:57, Christopher Allen

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CADnndOrUaAFiT07pa2MWrtOBhG7_SU3vUuk8CHt1w2mxX-dkzQ@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">It seems like you know how to run practical tests

        for tuning thread count and contention for throughput. Part of

        the reason you haven't gotten a super clear answer is "it

        depends." You give up fairness when you use STM instead of MVars

        or equivalent structures. That means a long running transaction

        might get stampeded by many small ones invalidating it over and

        over. The long-running transaction might never clear if

        the small transactions keep moving the cheese. I mention this

        because transaction runtime and size and count all affect

        throughput and latency. What might be ideal for one pattern of

        work might not be ideal for another. Optimizing for overall

        throughput might make the contention and fairness problems worse

        too. I've done practical tests to optimize this in the past,

        both for STM in Haskell and for RDBMS workloads.

        <div><br>

        </div>

        <div>The next step is sometimes figuring out whether you really

          need a data structure within a single STM container or if

          perhaps you can break up your STM container boundaries into

          zones or regions that roughly map onto update boundaries. That

          should make the transactions churn less. On the outside chance

          you do need to touch more than one container in a transaction,

          well, they compose.

          <div><br>

          </div>

          <div>e.g. <a

              href="https://hackage.haskell.org/package/stm-containers"

              moz-do-not-send="true">https://hackage.haskell.org/package/stm-containers</a></div>

          <div><a href="https://hackage.haskell.org/package/ttrie"

              moz-do-not-send="true">https://hackage.haskell.org/package/ttrie</a><br>

          </div>

          <div><br>

          </div>

          <div>It also sounds a bit like your question bumps into

            Amdahl's Law a bit.</div>

          <div><br>

          </div>

          <div>All else fails, stop using STM and find something more

            tuned to your problem space.</div>

          <div><br>

          </div>

          <div>Hope this helps,</div>

          <div>Chris Allen</div>

          <div><br>

          </div>

        </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Thu, Jul 23, 2020 at 9:53

          AM YueCompl via Haskell-Cafe <<a

            href="mailto:haskell-cafe@haskell.org"

            moz-do-not-send="true">haskell-cafe@haskell.org</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div style="overflow-wrap: break-word;">Hello Cafe,

            <div><br>

            </div>

            <div>I'm working on an in-memory database, in Client/Server

              mode I just let each connected client submit remote

              procedure call running in its dedicated lightweight

              thread, modifying TVars in RAM per its business needs,

              then in case many clients connected concurrently and

              trying to insert new data, if they are triggering global

              index (some TVar) update, the throughput would drop

              drastically. I reduced the shared state to a simple int

              counter by TVar, got same symptom. While the parallelism

              feels okay when there's no hot TVar conflicting, or M is

              not much greater than N.</div>

            <div><br>

            </div>

            <div>As an empirical test workload, I have a `+RTS -N10`

              server process, it handles 10 concurrent clients okay, got

              ~5x of single thread throughput; but in handling 20

              concurrent clients, each of the 10 CPUs can only be driven

              to ~10% utilization, the throughput seems even worse than

              single thread. More clients can even drive it thrashing

              without much  progressing.</div>

            <div><br>

            </div>

            <div> I can understand that pure STM doesn't scale well

              after reading [1], and I see it suggested [7] attractive

              and planned future work toward that direction.</div>

            <div><br>

            </div>

            <div>But I can't find certain libraries or frameworks

              addressing large M over small N scenarios, [1]

              experimented with designated N parallelism, and [7] is

              rather theoretical to my empirical needs.</div>

            <div><br>

            </div>

            <div>Can you direct me to some available library

              implementing the methodology proposed in [7] or other ways

              tackling this problem?</div>

            <div><br>

            </div>

            <div>I think the most difficult one is that a transaction

              should commit with global indices (with possibly unique

              constraints) atomically updated, and rollback with any

              violation of constraints, i.e. transactions have to cover

              global states like indices. Other problems seem more

              trivial than this.</div>

            <div><br>

            </div>

            <div>Specifically, [7] states:</div>

            <div><br>

            </div>

            <div>> It must be emphasized that all of the mechanisms

              we deploy originate, in one form or another, in the

              database literature from the

              70s and 80s. Our contribution is to adapt these techniques

              to software transactional memory, providing more effective

              solutions to

              important STM problems than prior proposals.</div>

            <div><br>

            </div>

            <div>I wonder any STM based library has simplified those

              techniques to be composed right away? I don't really want

              to implement those mechanisms by myself, rebuilding many

              wheels from scratch.</div>

            <div><br>

            </div>

            <div>Best regards,</div>

            <div>Compl</div>

            <div><br>

            </div>

            <div><br>

            </div>

            <div>[1] Comparing the performance of concurrent linked-list

              implementations in Haskell </div>

            <div><a

                href="https://simonmar.github.io/bib/papers/concurrent-data.pdf"

                target="_blank" moz-do-not-send="true">https://simonmar.github.io/bib/papers/concurrent-data.pdf</a></div>

            <div><br>

            </div>

            <div>[7] M. Herlihy and E. Koskinen. Transactional boosting:

              a methodology

              for highly-concurrent transactional objects. In Proc. of

              PPoPP ’08,

              pages 207–216. ACM Press, 2008.</div>

            <div><a

                href="https://www.cs.stevens.edu/~ejk/papers/boosting-ppopp08.pdf"

                target="_blank" moz-do-not-send="true">https://www.cs.stevens.edu/~ejk/papers/boosting-ppopp08.pdf</a></div>

            <div><br>

            </div>

          </div>

          _______________________________________________<br>

          Haskell-Cafe mailing list<br>

          To (un)subscribe, modify options or view archives go to:<br>

          <a

            href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe"

            rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br>

          Only members subscribed via the mailman list are allowed to

          post.</blockquote>

      </div>

      <br clear="all">

      <div><br>

      </div>

      -- <br>

      <div dir="ltr" class="gmail_signature">

        <div dir="ltr">

          <div>

            <div dir="ltr">

              <div dir="ltr">Chris Allen<br>

                <div><span style="font-size:12.8px">Currently working

                    on </span><a href="http://haskellbook.com"

                    target="_blank" moz-do-not-send="true">http://haskellbook.com</a></div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

  </body>

</html>