[Haskell-cafe] STM friendly TreeMap (or similar with range scan api) ? WAS: Best ways to achieve throughput, for large M:N ratio of STM threads, with hot TVar updates?
YueCompl
compl.yue at icloud.com
Thu Jul 30 12:10:22 UTC 2020
Hi Peter,
Great to hear from you!
For the record tskiplist (and stm-containers together) did improve my situation a great lot with respect to scalability at concurrency/parallelism!
I'm still far from the stage to squeeze last drops of performance, currently I'm just making sure performance wise concerns be reasonable during my PoC in correctness and ergonomics of my HPC architecture (an in-memory graph + out-of-core (mmap) array DBMS powered computation cluster, with shared storage), and after parallelism appears acceptable, I seemingly suffer from serious GC issue at up scaling on process working memory size. I'm suspecting it's because of the added more TVars and/or aggressive circular structures of them in my case, and can not find a way to overcome it by far.
Thanks for your detailed information!
Best regards,
Compl
> On 2020-07-30, at 19:19, Peter Robinson <pwr at lowerbound.io> wrote:
>
> Hi Compl,
> >+ This package provides a proof-of-concept implementation of a skip list in STM
>
> This has to mean something but I can't figure out yet.
>
> Dear Peter Robinson, I hope you can see this message and get in the loop of discussion.
>
>
> The reason for adding this sentence was that tskiplist hasn't been optimized for production use. Later on, I wrote an implementation of a concurrent skip list with atomic operations that performs significantly better, but it's operations work in the IO monad.
>
> I'm surprised to hear that you're getting poor performance even when using the stm-container package, which I believe was meant to be used in production. A while ago, I ran some benchmarks comparing concurrent dictionary data structures (such as stm-container) under various workloads. While STMContainers.Map wasn't as fast as the concurrent-hashtable package, the results indicate that the performance doesn't degrade too much under larger workloads.
>
> You can find these benchmark results here (10^6 randomly generated insertion/deletion/lookup requests distributed among 32 threads):
> https://lowerbound.io/blog/bench2-32.html <https://lowerbound.io/blog/bench2-32.html>
> And some explanations about the benchmarks are here:
> https://lowerbound.io/blog/2019-10-24_concurrent_hash_table_performance.html <https://lowerbound.io/blog/2019-10-24_concurrent_hash_table_performance.html>
>
> One issue that I came across when implementing the tskiplist package was this: If a thread wants to insert some item into the skip list, it needs to search for the entry point by performing readTVar operations starting at the list head. So, on average, a thread will read O(log n) TVars (assuming a skip list of n items) and, if any of these O(log n) TVars are modified by a simultaneously running thread, the STM runtime will observe a (false) conflict and rerun the transaction. It's not clear to me how to resolve this issue without access to something like unreadTVar (see [1]).
>
> Best,
> Peter
>
> [1] UnreadTVar: Extending Haskell Software Transactional Memory for Performance (2007) by Nehir Sonmez , Cristian Perfumo , Srdjan Stipic , Adrian Cristal , Osman S. Unsal , Mateo Valero.
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20200730/5e629f17/attachment.html>
More information about the Haskell-Cafe
mailing list