creating a new array vs updating an old one

Hal Daume III hdaume@ISI.EDU
Tue, 9 Jul 2002 08:31:09 -0700 (PDT)


I have large, two dimensional arrays.  Usually something like:

> Array (Int,Int) Double

where they represent probability distributions, for instance:

  P(x|y) := arr ! (x,y)

where x and y are int representations of events.

many times, i calculate these distributions by collecting counts and then
normalizing over sums, something like:

> normalize arr = 
>     arr // [((i,j), arr ! (i,j) / sum of elements in j column...]

of course, this could also be written:

> normalize arr = 
>     array bounds [((i,j), arr ! (i,j) / sum of elements in j column...]

my question is this:

  which is the better operation?

in the first case, we are updating every element in the array, so there's
no performance gain in the sense that there's less information to be
written; furthermore, presumably the elements will be touched in the same
order.

once i normalize, i never use the old array, so in *theory* the compile
could do it in place, which would make the first much better; in practice,
this doesn't seem to always happen (though I haven't looked at it
vigorously -- how could i find out?).



--
Hal Daume III

 "Computer science is no more about computers    | hdaume@isi.edu
  than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume