set representation question

Wed Nov 12 21:07:45 EST 2003

Dear Nicholas,

Nicholas Nethercote (Wed, Nov 12, 2003 at 11:32:54AM +0000):
> On Wed, 12 Nov 2003, Tom Pledger wrote:
> 
> > Hal Daume III writes:
> >  :
> >  | *all* i care about is being able to quickly calculate the size of
> >  | the intersection of two sets.  these sets are, in general, very
> >  | sparse, which means that the intersections tend to be small.
> >  |
> >  | for example, i might have two sets reprsented by the arrays:
> >  |
> >  |  {0,1,10,346,398,1039,3289,3853,9811,89231,50913}
> >  |  {0,3,98,183,398,1038,5319,7642,9811,13893,93123}
> >  |
> >  | and all i need to be able to do is respond with "3" very very
> >  | quickly.  using sorted arrays, this takes O(n+m) time (where n and
> >  | m are the sizes of the arrays).
> >
> > The total time (including the up front time for building the data
> > structure) can't go below O(n+m), because if it did, you'd be
> > neglecting to look at some of the elements at all.
> 
> Isn't it O(min(m,n))?  You don't have to look at all elements for the
> intersection.  Eg:
> 
>   {0,1,10}
>   {0,3,98,183,398,1038,5319,7642,9811,13893,93123}

O(f) describes the worst case of the algorithm. It is O((m,n)->m+n).
The average cost may be lower, but it depends on the distribution of the
data.

Sincerly,
-- 
Stefan Karrmann