<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Thanks Chris, I confess I didn't pay enough attention to STM
specialized container libraries by far, I skimmed through the
description of stm-containers and ttrie, and feel they would
definitely improve my code's performance in case I limit the
server's parallelism within hardware capabilities. That may
because I'm still prototyping the api and infrastructure for
correctness, so even `TVar (HashMap k v)` performs okay for me at
the moment, only if at low contention (surely there're plenty of
CPU cycles to be optimized out in next steps). I model my data
after graph model, so most data, even most indices are localized
to nodes and edges, those can be manipulated without conflict,
that's why I assumed I have a low contention use case since the
very beginning - until I found there are still (though minor)
needs for global indices to guarantee global uniqueness, I feel
faithful with stm-containers/ttrie to implement a more scalable
global index data structure, thanks for hinting me.</p>
<p>So an evident solution comes into my mind now, is to run the
server with a pool of tx processing threads, matching number of
CPU cores, client RPC requests then get queued to be executed in
some thread from the pool. But I'm really fond of the mechanism of
M:N scheduler which solves massive/dynamic concurrency so
elegantly. I had some good result with Go in this regard, and see
GHC at par in doing this, I don't want to give up this enjoyable
machinery.</p>
<p>But looked at the stm implementation in GHC, it seems written
TVars are exclusively locked during commit of a tx, I suspect this
is the culprit when there're large M lightweight threads scheduled
upon a small N hardware capabilities, that is when a lightweight
thread yield control during an stm transaction commit, the TVars
it locked will stay so until it's scheduled again (and again) till
it can finish the commit. This way, descheduled threads could hold
live threads from progressing. I haven't gone into more details
there, but wonder if there can be some improvement for GHC RTS to
keep an stm committing thread from descheduled, but seemingly that
may impose more starvation potential; or stm can be improved to
have its TVar locks preemptable when the owner trec/thread is in
descheduled state? Neither should be easy but I'd really love
massive lightweight threads doing STM practically well.<br>
</p>
<p>Best regards,</p>
<p>Compl</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 2020/7/24 上午12:57, Christopher Allen
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CADnndOrUaAFiT07pa2MWrtOBhG7_SU3vUuk8CHt1w2mxX-dkzQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">It seems like you know how to run practical tests
for tuning thread count and contention for throughput. Part of
the reason you haven't gotten a super clear answer is "it
depends." You give up fairness when you use STM instead of MVars
or equivalent structures. That means a long running transaction
might get stampeded by many small ones invalidating it over and
over. The long-running transaction might never clear if
the small transactions keep moving the cheese. I mention this
because transaction runtime and size and count all affect
throughput and latency. What might be ideal for one pattern of
work might not be ideal for another. Optimizing for overall
throughput might make the contention and fairness problems worse
too. I've done practical tests to optimize this in the past,
both for STM in Haskell and for RDBMS workloads.
<div><br>
</div>
<div>The next step is sometimes figuring out whether you really
need a data structure within a single STM container or if
perhaps you can break up your STM container boundaries into
zones or regions that roughly map onto update boundaries. That
should make the transactions churn less. On the outside chance
you do need to touch more than one container in a transaction,
well, they compose.
<div><br>
</div>
<div>e.g. <a
href="https://hackage.haskell.org/package/stm-containers"
moz-do-not-send="true">https://hackage.haskell.org/package/stm-containers</a></div>
<div><a href="https://hackage.haskell.org/package/ttrie"
moz-do-not-send="true">https://hackage.haskell.org/package/ttrie</a><br>
</div>
<div><br>
</div>
<div>It also sounds a bit like your question bumps into
Amdahl's Law a bit.</div>
<div><br>
</div>
<div>All else fails, stop using STM and find something more
tuned to your problem space.</div>
<div><br>
</div>
<div>Hope this helps,</div>
<div>Chris Allen</div>
<div><br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Jul 23, 2020 at 9:53
AM YueCompl via Haskell-Cafe <<a
href="mailto:haskell-cafe@haskell.org"
moz-do-not-send="true">haskell-cafe@haskell.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="overflow-wrap: break-word;">Hello Cafe,
<div><br>
</div>
<div>I'm working on an in-memory database, in Client/Server
mode I just let each connected client submit remote
procedure call running in its dedicated lightweight
thread, modifying TVars in RAM per its business needs,
then in case many clients connected concurrently and
trying to insert new data, if they are triggering global
index (some TVar) update, the throughput would drop
drastically. I reduced the shared state to a simple int
counter by TVar, got same symptom. While the parallelism
feels okay when there's no hot TVar conflicting, or M is
not much greater than N.</div>
<div><br>
</div>
<div>As an empirical test workload, I have a `+RTS -N10`
server process, it handles 10 concurrent clients okay, got
~5x of single thread throughput; but in handling 20
concurrent clients, each of the 10 CPUs can only be driven
to ~10% utilization, the throughput seems even worse than
single thread. More clients can even drive it thrashing
without much progressing.</div>
<div><br>
</div>
<div> I can understand that pure STM doesn't scale well
after reading [1], and I see it suggested [7] attractive
and planned future work toward that direction.</div>
<div><br>
</div>
<div>But I can't find certain libraries or frameworks
addressing large M over small N scenarios, [1]
experimented with designated N parallelism, and [7] is
rather theoretical to my empirical needs.</div>
<div><br>
</div>
<div>Can you direct me to some available library
implementing the methodology proposed in [7] or other ways
tackling this problem?</div>
<div><br>
</div>
<div>I think the most difficult one is that a transaction
should commit with global indices (with possibly unique
constraints) atomically updated, and rollback with any
violation of constraints, i.e. transactions have to cover
global states like indices. Other problems seem more
trivial than this.</div>
<div><br>
</div>
<div>Specifically, [7] states:</div>
<div><br>
</div>
<div>> It must be emphasized that all of the mechanisms
we deploy originate, in one form or another, in the
database literature from the
70s and 80s. Our contribution is to adapt these techniques
to software transactional memory, providing more effective
solutions to
important STM problems than prior proposals.</div>
<div><br>
</div>
<div>I wonder any STM based library has simplified those
techniques to be composed right away? I don't really want
to implement those mechanisms by myself, rebuilding many
wheels from scratch.</div>
<div><br>
</div>
<div>Best regards,</div>
<div>Compl</div>
<div><br>
</div>
<div><br>
</div>
<div>[1] Comparing the performance of concurrent linked-list
implementations in Haskell </div>
<div><a
href="https://simonmar.github.io/bib/papers/concurrent-data.pdf"
target="_blank" moz-do-not-send="true">https://simonmar.github.io/bib/papers/concurrent-data.pdf</a></div>
<div><br>
</div>
<div>[7] M. Herlihy and E. Koskinen. Transactional boosting:
a methodology
for highly-concurrent transactional objects. In Proc. of
PPoPP ’08,
pages 207–216. ACM Press, 2008.</div>
<div><a
href="https://www.cs.stevens.edu/~ejk/papers/boosting-ppopp08.pdf"
target="_blank" moz-do-not-send="true">https://www.cs.stevens.edu/~ejk/papers/boosting-ppopp08.pdf</a></div>
<div><br>
</div>
</div>
_______________________________________________<br>
Haskell-Cafe mailing list<br>
To (un)subscribe, modify options or view archives go to:<br>
<a
href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br>
Only members subscribed via the mailman list are allowed to
post.</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div dir="ltr">Chris Allen<br>
<div><span style="font-size:12.8px">Currently working
on </span><a href="http://haskellbook.com"
target="_blank" moz-do-not-send="true">http://haskellbook.com</a></div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>