[Haskell-cafe] Re: FASTER primes

Wed Dec 30 19:28:56 EST 2009

Am Mittwoch 30 Dezember 2009 20:46:57 schrieb Will Ness:
> Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > Am Dienstag 29 Dezember 2009 20:16:59 schrieb Daniel Fischer:
> > > > especially the claim that going by primes squares
> > > > is "a pleasing but minor optimization",
> > >
> > > Which it is not. It is a major optimisation. It reduces the algorithmic
> > > complexity *and* reduces the constant factors significantly.
> >
> > D'oh! Thinko while computing sum (takeWhile (<= n) primes) without paper.
> > It doesn't change the complexity, and the constant factors are reduced
> > far less than I thought.
>
> I do not understand. Turner's sieve is
>
>   primes = sieve [2..]
>    where
>     sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0]
>
> and the Postponed Filters is
>
>   primes = 2: 3: sieve (tail primes) [5,7..]
>    where
>     sieve (p:ps) xs = h ++ sieve ps [x | x<-t, x `rem` p /= 0]
>                       where (h,~(_:t)) = span (< p*p) xs
>
> Are you saying they both exhibit same complexity?

No. They don't.
But if you're looking at an imperative (mutable array) sieve (that's simpler to analyse 
because you don't have to take the book-keeping costs of your priority queue, heap or 
whatever into account), if you start crossing out the multiples of p with 2*p, you have

sum [bound `div` p - 1 | p <- takeWhile (<= sqrt bound) primes]

crossings-out, that is Theta(bound*log (log bound)). If you eliminate multiples of some 
small primes a priori (wheel), you can reduce the constant factor significantly, but the 
complexity remains the same (you drop a few terms from the front of the sum and multiply 
the remaining terms with phi(n)/n, where n is the product of the excluded primes).

If you start crossing out at p^2, the number is

sum [bound `div` p - (p-1) | p <- takeWhile (<= sqrt bound) primes].

The difference is basically sum (takeWhile (<= sqrt bound) primes), which I stupidly - I 
don't remember how - believed to cancel out the main term. 
It doesn't, it's O(bound/log bound), so the complexity is the same.

Now if you take a stream of numbers from which you remove composites, having a priority 
queue of multiples of primes, things are a little different.
 If you start crossing out at 2*p, when you are looking at n, you have more multiples in 
your PQ than if you start crossing out at p^2 (about pi(n/2) vs. pi(sqrt n)), so updating 
the PQ will be more expensive. But updating the PQ is O(log size), I believe, and log 
pi(n) is O(log pi(sqrt n)), so I think it shouldn't change the complexity here either. I 
think this would have complexity O(bound*log bound*log (log bound)).

> I was under impression that the first shows O(n^2) approx., and the second one O(n^1.5) 
(for n
> primes produced).

In Turner/postponed filters, things are really different. Actually, Ms. O'Neill is right, 
it is a different algorithm. In the above, we match each prime only with its multiples 
(plus the next composite in the PQ version). In Turner's algorithm, we match each prime 
with all smaller primes (and each composite with all primes <= its smallest prime factor, 
but that's less work than the primes). That gives indeed a complexity of O(pi(bound)^2).
In the postponed filters, we match each prime p with all primes <= sqrt p (again, the 
composites contribute less). I think that gives a complexity of 
O(pi(bound)^1.5*log (log bound)) or O(n^1.5*log (log n)) for n primes produced.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20091230/e0f1017f/attachment.html