[Haskell-cafe] Best ways to achieve throughput, for large M:N ratio of STM threads, with hot TVar updates?

Thu Jul 23 16:57:51 UTC 2020

It seems like you know how to run practical tests for tuning thread count
and contention for throughput. Part of the reason you haven't gotten a
super clear answer is "it depends." You give up fairness when you use STM
instead of MVars or equivalent structures. That means a long running
transaction might get stampeded by many small ones invalidating it over and
over. The long-running transaction might never clear if the small
transactions keep moving the cheese. I mention this because transaction
runtime and size and count all affect throughput and latency. What might be
ideal for one pattern of work might not be ideal for another. Optimizing
for overall throughput might make the contention and fairness problems
worse too. I've done practical tests to optimize this in the past, both for
STM in Haskell and for RDBMS workloads.

The next step is sometimes figuring out whether you really need a data
structure within a single STM container or if perhaps you can break up your
STM container boundaries into zones or regions that roughly map onto update
boundaries. That should make the transactions churn less. On the outside
chance you do need to touch more than one container in a transaction, well,
they compose.

e.g. https://hackage.haskell.org/package/stm-containers
https://hackage.haskell.org/package/ttrie

It also sounds a bit like your question bumps into Amdahl's Law a bit.

All else fails, stop using STM and find something more tuned to your
problem space.

Hope this helps,
Chris Allen

On Thu, Jul 23, 2020 at 9:53 AM YueCompl via Haskell-Cafe <
haskell-cafe at haskell.org> wrote:

> Hello Cafe,
>
> I'm working on an in-memory database, in Client/Server mode I just let
> each connected client submit remote procedure call running in its dedicated
> lightweight thread, modifying TVars in RAM per its business needs, then in
> case many clients connected concurrently and trying to insert new data, if
> they are triggering global index (some TVar) update, the throughput would
> drop drastically. I reduced the shared state to a simple int counter by
> TVar, got same symptom. While the parallelism feels okay when there's no
> hot TVar conflicting, or M is not much greater than N.
>
> As an empirical test workload, I have a `+RTS -N10` server process, it
> handles 10 concurrent clients okay, got ~5x of single thread throughput;
> but in handling 20 concurrent clients, each of the 10 CPUs can only be
> driven to ~10% utilization, the throughput seems even worse than single
> thread. More clients can even drive it thrashing without much  progressing.
>
>  I can understand that pure STM doesn't scale well after reading [1], and
> I see it suggested [7] attractive and planned future work toward that
> direction.
>
> But I can't find certain libraries or frameworks addressing large M over
> small N scenarios, [1] experimented with designated N parallelism, and [7]
> is rather theoretical to my empirical needs.
>
> Can you direct me to some available library implementing the methodology
> proposed in [7] or other ways tackling this problem?
>
> I think the most difficult one is that a transaction should commit with
> global indices (with possibly unique constraints) atomically updated, and
> rollback with any violation of constraints, i.e. transactions have to cover
> global states like indices. Other problems seem more trivial than this.
>
> Specifically, [7] states:
>
> > It must be emphasized that all of the mechanisms we deploy originate, in
> one form or another, in the database literature from the 70s and 80s. Our
> contribution is to adapt these techniques to software transactional memory,
> providing more effective solutions to important STM problems than prior
> proposals.
>
> I wonder any STM based library has simplified those techniques to be
> composed right away? I don't really want to implement those mechanisms by
> myself, rebuilding many wheels from scratch.
>
> Best regards,
> Compl
>
>
> [1] Comparing the performance of concurrent linked-list implementations in
> Haskell
> https://simonmar.github.io/bib/papers/concurrent-data.pdf
>
> [7] M. Herlihy and E. Koskinen. Transactional boosting: a methodology for
> highly-concurrent transactional objects. In Proc. of PPoPP ’08, pages
> 207–216. ACM Press, 2008.
> https://www.cs.stevens.edu/~ejk/papers/boosting-ppopp08.pdf
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.

-- 
Chris Allen
Currently working on http://haskellbook.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20200723/45d4b8ac/attachment.html>