[Haskell-cafe] Re: FASTER primes

Sat Jan 2 08:13:29 EST 2010

Daniel Fischer <daniel.is.fischer <at> web.de> writes:

> 
> 
> Am Mittwoch 30 Dezember 2009 20:46:57 schrieb Will Ness:
> > Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > > Am Dienstag 29 Dezember 2009 20:16:59 schrieb Daniel Fischer:
> > > > > especially the claim that going by primes squares
> > > > > is "a pleasing but minor optimization",
> > > >
> > > > Which it is not. It is a major optimisation. It reduces the algorithmic
> > > > complexity *and* reduces the constant factors significantly.
> > >
> > > D'oh! Thinko while computing sum (takeWhile (<= n) primes) without paper.
> > > It doesn't change the complexity, and the constant factors are reduced
> > > far less than I thought.
> >
> > I do not understand. Turner's sieve is
> >
> >   primes = sieve [2..]
> >    where
> >     sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0]
> >
> > and the Postponed Filters is
> >
> >   primes = 2: 3: sieve (tail primes) [5,7..]
> >    where
> >     sieve (p:ps) xs = h ++ sieve ps [x | x<-t, x `rem` p /= 0]
> >                       where (h,~(_:t)) = span (< p*p) xs
> >
> > Are you saying they both exhibit same complexity?
> 
> No. They don't.
> But if you're looking at an imperative (mutable array) sieve (that's simpler
> to analyse because you don't have to take the book-keeping costs of your
> priority queue, heap or whatever into account), if you start crossing out 

The key question then, is _*WHEN*_ and not _*WHAT*_. As is clearly demonstrated
by the case of Turner/Postponed filters, the work that is done (of crossing
numbers off) is the same - _when_ it is actually done - but Turner's starts out
so _prematurely_ that it is busy doing nothing most of the time. Thus its
function call overhead costs pile up enormously, overstaging the actual
calculation. 

So analyzing that calculation in the premature execution setting is missing the
point, although helpful after we fix this, with the Postponed Filters. _Only
then_ the finer points of this algorithm's analysis can be applied - namely, of
avoiding testing primes divisibility altogether. And _if_ a fast cheap primality
test were to have existed, the filtering versions would win over, because they
progressively cull the input sequence so there would be no double hits as we
have when merging the multiples (whether from lists or inside the PQ).

> the multiples of p with 2*p, you have
> 
> sum [bound `div` p - 1 | p <- takeWhile (<= sqrt bound) primes]
> 
> crossings-out, that is Theta(bound*log (log bound)). If you eliminate 
> multiples of some small primes a priori (wheel), you can reduce the constant 
> factor significantly, but the complexity remains the same (you drop a few
> terms from the front of the sum and multiply the remaining terms with 
> phi(n)/n, where n is the product of the excluded primes).
> 
> If you start crossing out at p^2, the number is
> 
> sum [bound `div` p - (p-1) | p <- takeWhile (<= sqrt bound) primes].
> 
> The difference is basically sum (takeWhile (<= sqrt bound) primes), which I 
> stupidly - I don't remember how - believed to cancel out the main term. 
> It doesn't, it's O(bound/log bound), so the complexity is the same.
>
> Now if you take a stream of numbers from which you remove composites, having
> a priority queue of multiples of primes, things are a little different.
>  If you start crossing out at 2*p, when you are looking at n, you have more 
> multiples in your PQ than if you start crossing out at p^2 (about pi(n/2) 
> vs. pi(sqrt n)), so updating the PQ will be more expensive. But updating the 
> PQ is O(log size), I believe, and log pi(n) is O(log pi(sqrt n)), so I think 
> it shouldn't change the complexity here either. I think this would have 
> complexity O(bound*log bound*log (log bound)).

There are two questions here - where to start crossing off numbers, and when. If
you'd start at 2*p maybe the overall complexity would remain the same but it'll
add enormous overhead with all those duplicate multiples. No, the question is
not where to start, but when. PQ might hide the problem until the memory blows
up. Anything that we add that won't have any chance of contributing to the final
result, is added for nothing and only drives the total cost up needlessly. 

> 
> > I was under impression that the first shows O(n^2) approx., and the second 
> > one O(n^1.5) (for n primes produced).
> 
> In Turner/postponed filters, things are really different. Actually, Ms. 
> O'Neill is right, it is a different algorithm. In the above, we match each 

what _is_ different is divisibility testing vs composites removal, which follows
from her in-depth analysis although is never quite formulated in such words in
her article. But nothing matters until the premature starting up is eliminated,
and that key observation is missing for the article either - worse, it is
brushed off with the casual remark that it is "a pleasing but minor
optimization". Which remark, as you show here, is true in the imperative,
mutable-storage setting, but is made in an article abut functional code, in
regard to the functional code of Turner's sieve. So the key understanding is
overlooked.

Her article even adds the primes into the PQ prematurely itself, as soon as the
prime is discovered (she fixes that in her ZIP package). With the PQ keeping
these prematurely added elements deep inside its guts, the problem might not
even manifest itself immediately, but the memory blow-up would eventually hit
the wall (having the PQ contain all the preceding primes, instead of just those
below the square root of a limit - all the entries above the square root not
contributing to the calculation at all).

And what we're after here is the insight anyway. Skipping steps of natural
development does not contribute to garnering an insight. Most prominent problem
with Turner's /code/ _specification_, is the premature start ups, and
divisibility testing of primes (as Melissa O'Neill's analysis shows). Fix one,
and you get Postponed Filters (which should really be used as a basis reference
point for all the rest). Fix the other - and you've got the Euler's sieve:

    primes = 2: 3: sieve (tail primes) [5,7..]
     where
      sieve (p:ps) xs = h ++ sieve ps [t `minus` tail [p*p, p*p+2*p..]]
                        where (h,~(_:t)) = span (< p*p) xs

Clear, succinct, and plenty efficient for an introductory textbook functional
lazy code, of the same order of magnitude performance as PQ code on odds only.

Now comparing the PQ performance against _that_ would only be fare, and IMO
would only add to its value - it is faster, has better asymptotics, and is
greatly amenable to the wheel optimization right away. 

> prime only with its multiples (plus the next composite in the PQ version). 
> In Turner's algorithm, we match each prime with all smaller primes (and each 
> composite with all primes <= its smallest prime factor, but that's less work 
> than the primes). That gives indeed a complexity of O(pi(bound)^2).
>
> In the postponed filters, we match each prime p with all primes <= sqrt p 
> (again, the composites contribute less). I think that gives a complexity of 
> O(pi(bound)^1.5*log (log bound)) or O(n^1.5*log (log n)) for n primes 
> produced.

Empirically, it was always _below_ 1.5, and above 1.4 , and PQ/merged multiples
removal around 1.25..1.17 .

> 
> 
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>