[Haskell-cafe] Re: FASTER primes

Sun Jan 3 03:54:37 EST 2010

Daniel Fischer <daniel.is.fischer <at> web.de> writes:

> 
> 
> Am Samstag 02 Januar 2010 14:13:29 schrieb Will Ness:
> > Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > > Am Mittwoch 30 Dezember 2009 20:46:57 schrieb Will Ness:
> > > > Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > > > > Am Dienstag 29 Dezember 2009 20:16:59 schrieb Daniel Fischer:
> > > > > > > especially the claim that going by primes squares
> > > > > > > is "a pleasing but minor optimization",
> > > > > >
> > > > > > Which it is not. It is a major optimisation. It reduces the
> > > > > > algorithmic complexity *and* reduces the constant factors
> > > > > > significantly.
> > > > >
> > > > > D'oh! Thinko while computing sum (takeWhile (<= n) primes) without
> > > > > paper. It doesn't change the complexity, and the constant factors are
> > > > > reduced far less than I thought.
> > > >
> > > > I do not understand. Turner's sieve is
> > > >
> > > >   primes = sieve [2..]
> > > >    where
> > > >     sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0]
> > > >
> > > > and the Postponed Filters is
> > > >
> > > >   primes = 2: 3: sieve (tail primes) [5,7..]
> > > >    where
> > > >     sieve (p:ps) xs = h ++ sieve ps [x | x<-t, x `rem` p /= 0]
> > > >                       where (h,~(_:t)) = span (< p*p) xs
> > > >
> > > > Are you saying they both exhibit same complexity?
> > >
> > > No. They don't.
> > > But if you're looking at an imperative (mutable array) sieve (that's
> > > simpler to analyse because you don't have to take the book-keeping costs
> > > of your priority queue, heap or whatever into account), if you start
> > > crossing out
> >
> > The key question then, is _*WHEN*_ and not _*WHAT*_. As is clearly
> > demonstrated by the case of Turner/Postponed filters, the work that is done
> > (of crossing numbers off) is the same
> 
> Crossing off is only part of the work. Most of the work is checking whether 
> to cross off the number in this round. And Turner does a lot more of that
> than the postponed filters.

Exactly the point I tried to make. :) 

> > - _when_ it is actually done - but
> > Turner's starts out so _prematurely_ that it is busy doing nothing most of
> > the time.
> 
> It's not doing nothing. It's just doing a lot of superfluous work. It is
> _achieveing_ nothing most of the time, though.

again, yes. :)

> Take 7919, the thousandth prime. The postponed filters decide to keep it when
> fitering out the multiples of 89, the twenty-fourth prime. Turner also 
> divides it by all 975 primes in between. That is a lot of real but futile 
> work.

yes.

> > Thus its function call overhead costs pile up enormously,
> > overstaging the actual calculation.
> 
> It's not only the function calls. Division is expensive, too.

yes, that's what I meant - the cost of calling all the fuctions that - we know 
in advance will - have nothing to do eventually.

> > So analyzing that calculation in the premature execution setting is missing
> > the point, although helpful after we fix this, with the Postponed Filters.
> > _Only then_ the finer points of this algorithm's analysis can be applied -
> > namely, of avoiding testing primes divisibility altogether. And _if_ a fast
> > cheap primality test were to have existed, the filtering versions would win
> 
> Sorry, I can't follow. What's the point of a primality test here? Every
> number whose multiples we want to remove is prime, what's left to test?

sorry for being obstruse. I meant in a filtering sieve (i.e. postponed filters) 
vs the multiples removing sieves, if only the cheap primality test have existed 
(by some _magic_) which _could_ be run on _every_ numer (unlike the costly 
divisiility test), than the filtering sieves would win. Hypothetically.

> > over, because they progressively cull the input sequence so there would be
> > no double hits as we have when merging the multiples (whether from lists or
> > inside the PQ).
> 
> But there's a lot of list constructuion and deconstruction necessary for the
> Euler sieve. 

yes. Unless, of course, s smart compiler recognizes there's only one consumer 
for the values each multiples-list is producing, and keeps it stripped down to 
a generator function, and its current value. I keep thinkig a smart compiler 
could eliminate all these "span" calls and replace them with just some pointers 
manipulating...

> That may be more work than the multiple hits cause.

so that too would make filters win; only _if_ the cheap primality test 
existed!.. 

> >
> > > the multiples of p with 2*p, you have
> > > ...
> >
> > There are two questions here - where to start crossing off numbers, and
> > when. If you'd start at 2*p maybe the overall complexity would remain the
> > same but it'll add enormous overhead with all those duplicate multiples.
> 
> The additional duplicate multiples aren't the problem. Sure, the numbers
> having a prime divisor larger than the square root would be crossed off one
> additional time,  but that isn't so much per se. The additional crossings off
> are O(bound), so they don't harm the time complexity. But they can have
> effects which multiply the running time by a large constant.

yes, exactly what I wanted to say. :)

> > No, the question is not where to start, but when. PQ might hide the problem
> > until the memory blows up. Anything that we add that won't have any chance
> > of contributing to the final result, is added for nothing and only drives
> > the total cost up needlessly.
> 
> Well, if you start crossing off at p^2 and feed your PQ from a separate prime
> generator, the memory blow-up is far away. E.g., when the main sieve is at
> 10^12, the PQ contains less than 80000 multiples. No space problem in sight.
> At 10^14, the PQ's size is still less than 700000.  That's noticeable, but
> there's still plenty of space.

again, what I mean is, not _where_ I start crossing them off in a PQ, but 
_when_. The article's code starts crossing them off _at_ p^2 - by adding p^2+2p 
into the PQ - _as_ _soon_ as p itself is reached. It won't surface until p^2 
will be considered for a prime; it'll lay dormant deep inside the queue's guts. 
When reaching 7919, the thousand (i.e. pi(7919) ) entries will hang out inside 
the PQ - instead of just 24. A memory blowup. (this is of course fixed in 
Melissa's ZIP package). Of course due to the nature of PQ it might actually not 
hurt the performance for a while, depending on partcular PQ implementation. 
Complexity _and_ constant factors.

> If, on the other hand, you start crossung off at 2*p, when the main sieve is
> at 10^7, the size of the PQ is > 650000, at 10^8, the size is more than 5.5
> million. That starts to become a memory problem rather soon.

here you don't have a choice or when to add it - you have to add it at p 
itself - so the problem is clear. But even when you cross at p^2, the question 
remains, of when you add the p^2 entry into the PQ. That was my point.

Postponed Filters code makes this clear, and thus hard to err on. 
Unfortunately, it wasn't present the article.

> >
> > > > I was under impression that the first shows O(n^2) approx., and the
> > > > second one O(n^1.5) (for n primes produced).
> > >
> > > In Turner/postponed filters, things are really different. Actually, Ms.
> > > O'Neill is right, it is a different algorithm. In the above, we match
> > > each
> >
> > what _is_ different is divisibility testing vs composites removal, which
> > follows from her in-depth analysis although is never quite formulated in
> > such words in her article. But nothing matters until the premature starting
> > up is eliminated, and that key observation is missing for the article
> > either - worse, it is brushed off with the casual remark that it is "a
> > pleasing but minor optimization". Which remark, as you show here, is true
> > in the imperative, mutable-storage setting, but is made in an article abut
> > functional code, in regard to the functional code of Turner's sieve.
> 
> I think that remark was meant to apply to composite removal, not Turner's
> sieve.

It is right there on page 2, right when the Turner's sieve is presented and 
discussed. The only explanation that I see is that she thought of it in regards 
to the imperative code, just as her analysis concentrates only on calculation 
aspects of the imperative code itself. 

> But even then, the 'minor' isn't adequate considering her PQ approach where,
> although it doesn't change time complexity (in theory), it changes space
> complexity - and that may affect running time drastically. Probably she was
> thinking of the typical array sieve when she made that remark.

In short, that remark was not referring to what it seemingly refers to, in the 
article, and as such was very confusing. It was a big STOP sign on the way to 
Postponed Filters - Euler's - Bird's merged multiples - tree-merging (with 
wheel) road of little steps, and used as a justification for her to make a big 
leap across the chasm towards the PQ code. 

So to speak. :) 

> > So the key understanding is overlooked.
> >
> > And what we're after here is the insight anyway. Skipping steps of natural
> > development does not contribute to garnering an insight. 

.. and that is of course a matter of personal preference. A genius is perfectly 
capable of making big leaps across chasms. Heck they might even be able to 
fly :)

> > > In the postponed filters, we match each prime p with all primes <= sqrt p
> > > (again, the composites contribute less). I think that gives a complexity
> > > of O(pi(bound)^1.5*log (log bound)) or O(n^1.5*log (log n)) for n primes
> > > produced.
> >
> > Empirically, it was always _below_ 1.5, and above 1.4 , and PQ/merged
> > multiples removal around 1.25..1.17 .
> 
> I should really stop doing those calculations in my head, it seems I'm
> getting too old for that. Doing it properly, on paper, the cost for
> identifying p_k is pi(sqrt p_k) [trial division by all primes less than 
> sqrt p_k].
> p_k is approximately k*log k, so pi(sqrt p_k) is approximately 
> (sqrt $ k* log k)/log (sqrt $ k*log k) ~ 2*sqrt (k/log k).
> Summing that, the cost for identifying the first n primes is 
> Theta(n^1.5/log n).

that would correspond to what I've seen much better.

> Using a feeder, i.e. primes = 2:3.sieve feederprimes [5, 7 .. ], where
> feederprimes are generated as above, to reduce memory pressure and GC 
> impact, that fits pretty well with my measurements of the time to find 
> p_k for several k between 500,000 and 20,000,000.

The quesion of a memory blowup with the treefolding merge still remains. For 
some reason using its second copy for a feeder doesn't reduce the memory (as 
reported by standalone compiled program, GHCi reported values are useless) - it 
causes it to increase twice.....

> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>