[Haskell-cafe] Best ways to achieve throughput, for large M:N ratio of STM threads, with hot TVar updates?

Fri Jul 24 15:48:44 UTC 2020

Hi Jo, I think you are totally right about the situation, and I just 
want to make it clear that, I already chose STM and lightweight threads 
as GHC implemented them, STM for transactions and optimistic locking, 
lightweight threads for sophisticated smart scheduling. The global 
counter is only used to reveal the technical traits of my situation, 
it's of course not a requirement of my business needs. I'm not hurry for 
thorough performance optimization at current stage (PoC prototyping not 
finished yet), as long as the performance is reasonable, but the 
thrashing behavior really frightened me and I have to take it as a 
serious concern for the time being. Fortunately it doesn't feel so scary 
as it first appeared to me, after taking others' suggestions, I'll 
experiment more with these new information to me and see what will come out.

Thanks with regards,

Compl

On 2020/7/24 上午12:25, Joachim Durchholz wrote:
> While I can't contribute any Haskell knowledge, I know that many 
> threads updating the same variable is the worst thing you can do; not 
> only do you create a single bottleneck, if you have your threads 
> running on multiple cores you get CPU pipeline stalls, L1 cache line 
> flushes, and/or complicated cache coherency protocols executed between 
> cores. It's not cheap: each of these mechanisms can take hundreds of 
> CPU cycles, for a CPU that can execute multiple instructions per CPU 
> cycle.
>
> Incrementing a global counter is a really hard problem in 
> multithreading...
>
> I believe this is the reason why databases typically implement a 
> SEQUENCE mechanism, and these sequences are usually implemented as 
> "whenever a transaction asks for a sequence number, reserve a block of 
> 1,000 numbers for it so it can retrieve 999 additional numbers without 
> the synchronization overhead.
>
> This is also why real databases use transactions - these do not just 
> isolate processes from each other's updates, they also allow the DB to 
> let the transaction work on a snapshot and do all the synchronization 
> once, during COMMIT.
> And, as you just discovered, it's one of the major optimization areas 
> in database engines :-)
>
> TL;DR for the bad news: I suspect your problem is just unavoidable
>
> However, I see a workaround: delayed index update.
> Have each index twice: last-known and next-future.
> last-known is what was created during the last index update. You need 
> an append-only list of records that had an index field update, and all 
> searches that use the index will also have to do a linear search in 
> that list.
> next-future is built in the background. It takes last-known and the 
> updates from the append-only list, and generates a new index. Once 
> next-future is finished, replace last-known with it.
> You still need to to a global lock while replacing indexes, but you 
> don't have to lock the index for every single update but just once.
> You'll have to twiddle with parameters such as "at what point do I 
> start a new index build", and you'll have to make sure that your 
> linear list isn't yet another bottleneck (there are lock-free data 
> structures to achieve such a thing, but these are complicated; or you 
> can tell application programmers to try and collect as many updates as 
> possible in a transaction so the number of synchronization points is 
> smaller; however, too-large transactions can generate CPU cache 
> overflows if the collected update data becomes too large, so there's a 
> whole lot of tweaking, studying real performance data, hopefully 
> finding the right set of diagnostic information to collect that allow 
> the DB to automatically choose the right point to do its updates, etc. 
> pp.)
>
> TL;DR for the good news: You can coalesce N updates into one and 
> divide the CPU core coordination overhead by a factor of N. You'll 
> increase the bus pressure, so there's tons of fine tuning you can do 
> (or avoid) after getting the first 90% of the speedup. (I'm drawing 
> purely speculative numbers out of my hat here.)
> Liability: You will want to add transactions and (likely) optimistic 
> locking, if you don't have that already: Transaction boundaries are 
> the natural point for coalescing updates.
>
> Regards,
> Jo
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.