set representation question

Wed Nov 12 11:32:54 EST 2003

On Wed, 12 Nov 2003, Tom Pledger wrote:

> Hal Daume III writes:
>  :
>  | *all* i care about is being able to quickly calculate the size of
>  | the intersection of two sets.  these sets are, in general, very
>  | sparse, which means that the intersections tend to be small.
>  |
>  | for example, i might have two sets reprsented by the arrays:
>  |
>  |  {0,1,10,346,398,1039,3289,3853,9811,89231,50913}
>  |  {0,3,98,183,398,1038,5319,7642,9811,13893,93123}
>  |
>  | and all i need to be able to do is respond with "3" very very
>  | quickly.  using sorted arrays, this takes O(n+m) time (where n and
>  | m are the sizes of the arrays).
>
> The total time (including the up front time for building the data
> structure) can't go below O(n+m), because if it did, you'd be
> neglecting to look at some of the elements at all.

Isn't it O(min(m,n))?  You don't have to look at all elements for the
intersection.  Eg:

  {0,1,10}
  {0,3,98,183,398,1038,5319,7642,9811,13893,93123}

I can vaguely imagine some clever bitset implementation that lets you do
intersection with bitwise-and.  Mercury has a bitmap library, which might
be able to do something like this, but I'm not sure;  it might be more
suited to smaller and denser sets than you seem to have.  But it might be
worth a look (see www.cs.mu.oz.au/research/mercury/information/doc-release/library_8.html#SEC8)

N