[Haskell-cafe] Data.IntMap union complexity
wren ng thornton
wren at freegeek.org
Fri Feb 24 04:13:27 CET 2012
On 2/23/12 9:16 PM, Clark Gaebel wrote:
> Looking at IntMap's left-biased 'union' function [1], I noticed that the
> complexity is O(n+m) where n is the size of the left map, and m is the size
> of the right map.
>
> Since insertion [2] is O(min(n, W)) [ where W is the number of bits in an
> Int ], wouldn't it be more efficient to just fold 'insert' over one of the
> lists for a complexity of O(m*min(n, W))? This would degrade into O(m) in
> the worst case, as opposed to the current O(n+m).
The important things to bear in mind here are (1) the constant factors
actually matter in practice, and (2) what's actually going on. While
O(min(n,W)) is correct, it's incorrect to think about it as just a
constant (or as just a linear function). While technically incorrect,
it's better to think of it as O(log n) in order to get an intuition for
how it works. And O(m+n) is much nicer than O(m*log n).
Doing a fold with insert means that we must pay for the cost of
traversing one of the maps entirely, and the cost of walking the spine
for a lookup/insert m times. Whereas, with the merge function we only
have to traverse the portions of the spines which intersect, and we only
have to do it in one pass. In doing the fold, we're essentially ignoring
the fact that the maps have a trie structure, since we have to traverse
from the top for every insert; whereas for the merge, we make use of the
structure in order to avoid redundant traversals of the top part of the
structure.
Thus, the merge is doing less work. So, in theory, it should be faster.
However, again, the thing to beware of is the constant factors. In
particular, big-O algorithmic analysis doesn't really account for things
like locality and cache coherence, so one should always be on the
lookout for places where duplicating work is actually faster in
practice. If you're curious, you can always implement your own union
using the fold-with-insert method and then run some benchmarks.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list