Poor array behaviour

Wed Dec 23 02:55:51 EST 2009

Hi all,

I stumbled upon something odd with respect to arrays. I know about GHC not doing card marking and traversing whole arrays one each GC for each array with alterations, but still I don't understand the behaviour.

The situation is like this. I am building a compiler. This compiler manipulates a large graph (for optimizations, etc.). This graph is in memory. As the graph is vast, we did a lot of effort to optimize access to it. The nodes in the graph are IORefs to some data structure, and this data structure contains its edges.

Each node stores its edges in buckets. This is because edges have different types (some are control flow links, others are other types of dependencies). Most of the graph manipulations only traverse over one type of edge, so we figured it would be faster to store the edges in buckets to support such queries. Inside the buckets, there are Data.Maps containing the actual edges for that bucket. The keys in this map are the target nodes of that edges, which are IORefsOrds, which are pairs of a unique integers and a IORef, such that they can be ordered and used as keys in a Map. The values are lists of edges to that target.

The weird thing is in the buckets. Per node, all buckets are stored in an array.  We gave each edge type an integer key.

And we use that key as array index to determine the bucket. I've tried implementing this array in two ways:

1) Each node contains a immutable array with IORefs. The IORefs contain the actual Data.Maps for the buckets. So, for instance, initializing a node looks something like

import Data.Array

uBucket = 8 //There are 8 buckets because there are 8 types of edges.

emptyEdges =
  do  buckets <- sequence ( take uBucket $ repeat (newIORef Map.empty) )
        let myArray = listArray (0, 7) buckets
        return myArray

So in this solution we have an extra layer of indirection, namely the IORefs, but the array is immutable. Because when the edges change for a particular node, we can write directly into the IORef and leave the array untouched.

2) Each node contains a mutable array that contains the Data.Maps directly. So, for instance, initializing a node looks something like:

import Data.Array.IO

emptyEdges =
   do myArray <- newArray (0, 7) Map.empty
        return $! myArray

Of course, one expect 2 to be quickest. However, it turns out that for 2, the application is spending much more time in GC, and __even without GC__ the application is still slower! I find both things rather weird: I know that for huge arrays I am expected to suffer from the missing card-marking "bug", but my array sizes are only 8.

Yet the difference I get in GC time are huge:

Solution 1
-------------
  13,476,008,560 bytes allocated in the heap
   1,714,767,712 bytes copied during GC
     151,518,528 bytes maximum residency (23 sample(s))
       1,743,176 bytes maximum slop
             325 MB total memory in use (2 MB lost due to fragmentation)

  Generation 0: 25689 collections,     0 parallel,  2.43s,  2.53s elapsed
  Generation 1:    23 collections,     0 parallel,  1.89s,  2.05s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   20.58s  ( 20.80s elapsed)
  GC    time    4.32s  (  4.58s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   24.90s  ( 25.38s elapsed)

  %GC time      17.4%  (18.0% elapsed)

  Alloc rate    654,751,131 bytes per MUT second

  Productivity  82.6% of total user, 81.1% of total elapsed

Solution 2
------------

  15,901,133,296 bytes allocated in the heap
   9,083,063,848 bytes copied during GC
     117,501,208 bytes maximum residency (23 sample(s))
       1,902,568 bytes maximum slop
             265 MB total memory in use (2 MB lost due to fragmentation)

  Generation 0: 30315 collections,     0 parallel, 44.73s, 44.89s elapsed
  Generation 1:    23 collections,     0 parallel,  1.78s,  1.91s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   25.93s  ( 26.31s elapsed)
  GC    time   46.51s  ( 46.80s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   72.43s  ( 73.11s elapsed)

  %GC time      64.2%  (64.0% elapsed)

  Alloc rate    613,325,569 bytes per MUT second

  Productivity  35.8% of total user, 35.5% of total elapsed

Is the behaviour I am seeing to be expected? And if so, wouldn't it make more sense to implement Data.Array.IO internally such that it contains an immutable array of IORefs? I also saw that in the next GHC, card marking will be done per 128 array items. Yet this behaviour seems to point out that at least for my application problems can already occur with array size 8. And is it expected that solution 1), that has an extra layer of indirection, still outperforms solution 2) even with GC times substracted?

Regards,
Robert

The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message.