DiffArray Performance

Alastair Reid alastair at reid-consulting-uk.ltd.uk
Mon Oct 27 16:17:05 EST 2003


> [Binary tree] seems to be taking about 4.8 seconds (of 5.1 seconds total
> program execution time) for the input I'm using. I thought using
> DiffArrays might be faster, but no such luck. Execution time rose
> to 9.5 *minutes*.
>
> Is this what I should expect to see?

You don't mention _how_ you use diffarrays so I'll guess you're keeping the 
array in ascending order, finding the insertion point by binary search and 
inserting by copying bigger elements one place to the right.

Using binary trees (the code you copied), the cost of N insertions will be O(N 
* log N) assuming the input is in random order.

Using 'true' arrays (all operations constant time), the cost of copying values 
will dominate and inserting N elements will take O(N^2) time.  So, even in 
ideal circumstances, I'd expect it to be much slower for large N.

I think DiffArrays where most elements have been assigned to more than once 
have constant time update but linear time lookup.  So the total cost of 
moving the elements when you insert is still O(N^2) but finding where to 
insert will cost something closer to O(N * N * log N) since the cost of a 
single binary search will be O(N * log N).

Ignoring all the constant factors (a dubious thing to do) and choosing N=10^6, 
these relative costs look something like this:

  binary tree: 10^6 * 20        = 2 * 10^7
  true array:  10^6 * 10^6      =     10^12
  diff array:  10^6 * 10^6 * 20 = 2 * 10^13

(The constant overheads would have to be pretty large to mask differences like 
this.)

--
Alastair Reid    www.haskell-consulting.com




More information about the Glasgow-haskell-users mailing list