[Haskell-cafe] Data.IntMap union complexity

wren ng thornton wren at freegeek.org
Fri Feb 24 04:13:27 CET 2012


On 2/23/12 9:16 PM, Clark Gaebel wrote:
> Looking at IntMap's left-biased 'union' function [1], I noticed that the
> complexity is O(n+m) where n is the size of the left map, and m is the size
> of the right map.
>
> Since insertion [2] is O(min(n, W)) [ where W is the number of bits in an
> Int ], wouldn't it be more efficient to just fold 'insert' over one of the
> lists for a complexity of O(m*min(n, W))? This would degrade into O(m) in
> the worst case, as opposed to the current O(n+m).

The important things to bear in mind here are (1) the constant factors 
actually matter in practice, and (2) what's actually going on. While 
O(min(n,W)) is correct, it's incorrect to think about it as just a 
constant (or as just a linear function). While technically incorrect, 
it's better to think of it as O(log n) in order to get an intuition for 
how it works. And O(m+n) is much nicer than O(m*log n).

Doing a fold with insert means that we must pay for the cost of 
traversing one of the maps entirely, and the cost of walking the spine 
for a lookup/insert m times. Whereas, with the merge function we only 
have to traverse the portions of the spines which intersect, and we only 
have to do it in one pass. In doing the fold, we're essentially ignoring 
the fact that the maps have a trie structure, since we have to traverse 
from the top for every insert; whereas for the merge, we make use of the 
structure in order to avoid redundant traversals of the top part of the 
structure.

Thus, the merge is doing less work. So, in theory, it should be faster. 
However, again, the thing to beware of is the constant factors. In 
particular, big-O algorithmic analysis doesn't really account for things 
like locality and cache coherence, so one should always be on the 
lookout for places where duplicating work is actually faster in 
practice. If you're curious, you can always implement your own union 
using the fold-with-insert method and then run some benchmarks.

-- 
Live well,
~wren



More information about the Haskell-Cafe mailing list