set representation question
Tom Pledger
Tom.Pledger at peace.com
Wed Nov 12 15:00:38 EST 2003
Hal Daume III writes:
:
| *all* i care about is being able to quickly calculate the size of
| the intersection of two sets. these sets are, in general, very
| sparse, which means that the intersections tend to be small.
|
| for example, i might have two sets reprsented by the arrays:
|
| {0,1,10,346,398,1039,3289,3853,9811,89231,50913}
| {0,3,98,183,398,1038,5319,7642,9811,13893,93123}
|
| and all i need to be able to do is respond with "3" very very
| quickly. using sorted arrays, this takes O(n+m) time (where n and
| m are the sizes of the arrays).
|
| i'm willing to pay some up front cost for creating a data structure
| that could do this more quickly (i.e., in logarithmic time).
|
| is such a thing possible? "does anyone have a haskell
| implementation of" such a thing? :P
Hi.
The total time (including the up front time for building the data
structure) can't go below O(n+m), because if it did, you'd be
neglecting to look at some of the elements at all.
To move some of the effort up front, I suspect you're looking at a
compression problem in disguise. For example, if your data weren't so
sparse, you might try something like this:
IntSet = [(Int, Int)] -- ordered list of non-overlapping ranges
setToList s = concat [[a..b] | (a, b) <- s]
intersectionSize :: IntSet -> IntSet -> Int
intersectionSize s1@((a, b):s1') s2@((c, d):s2')
= rangeSize (max a c, min b d) +
case compare b d of
LT -> intersectionSize s1' s2
EQ -> intersectionSize s1' s2'
GT -> intersectionSize s1 s2'
intersectionSize _ _ = 0
So... how *compressible* are your data?
- Tom
More information about the Haskell
mailing list