[Haskell-cafe] Performance of concurrent array access

Andrew Coppin andrewcoppin at btinternet.com
Tue Aug 23 22:26:32 CEST 2011

On 23/08/2011 09:04 PM, Andreas Voellmy wrote:

> I compiled this with "ghc --make -rtsopts -threaded  -fforce-recomp -O2
> DirectTableTest.hs".
> Running "time ./DirectTableTest 1 +RTS -N1" takes about 1.4 seconds and
> running "time ./DirectTableTest 2 +RTS -N2" take about 2.0 seconds!

> I found that changing the array type used in the implementation of
> DirectAddressTable from IOArray to IOUArray fixes this problem. With
> this change, the running time of "time ./DirectTableTest 1 +RTS -N1" is
> takes about 1.4 seconds whereas the running "time ./DirectTableTest 2
> +RTS -N2" is about 1.0 seconds. Increasing to 4 cores gives a run time
> of 0.55 seconds.

> Finally, I tried one more variation. Instead of having the threads work
> on the same shared array, I had each thread work on its own array. This
> scales nicely (as you would expect), more or less like the second
> program, with either IOArray or IOUArray implementing the
> DirectAddressTable data structure.
> I understand why IOUArray might perform better than IOArray, but I don't
> know why it scales better to multiple threads and cores. Does anyone
> know why this might be happening or what I can do to find out what is
> going on?

I haven't deeply studied your code. However, I'm going to take a guess 
this has to do with strictness.

By using an IOArray, you're probably filling each cell with a reference 
to "drop 7 cyclicChars" or similar, which then only gets evaluated (in 
one thread) when you call "print".

By using an IOUArray, you're definitely forcing each character to be 
computed right away, by the thread doing the writing.

That's /probably/ what the difference is. As a guess. (Not sure how you 
can easily prove/disprove the theory though.)

You don't say which GHC version, but AFAIK recent releases of GHC have a 
seperate heap per thread (or was it per capability?), which probably 
makes a difference if you start giving each thread its own array. That 
and just plain ordinary cache coherance issues...

More information about the Haskell-Cafe mailing list