[Haskell-cafe] Re: FASTER primes

Mon Jan 4 18:33:03 EST 2010

Am Montag 04 Januar 2010 19:16:32 schrieb Will Ness:
> Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > Am Montag 04 Januar 2010 13:25:47 schrieb Will Ness:
> > > Euler's sieve is
> > >
> > >  sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..])
> > >                       where (h,t) = span (< p*p) xs
> >
> > Not quite. That's a variant of the postponed filters, it crosses off e.g.
> > 45 twice, once as 3*15 and once as 5*9 (okay, it has already been removed
> > by the first, so let's say 45 appears in two lists of numbers to be
> > removed if present).
>
> there won't be any such. whatever is removed first, is just skipped second
> (and third, etc). 45 does appear twice on the two multiples ists (3- and
> 5-). But it is "removed" by first, and skipped by second. And there's no
> removal here in the first place. There is no pre-filled storage. All there
> is, is some lists comparison, and lists are just generators (I so keep
> hoping).

((45:(offer 47 when demanded)) `minus` (45:(next will be 51 when demanded))) `minus` (45:
(next will be 55 when demanded))

So there are two attempts to tell the generator to not output 45. To the second, it 
answers "I already knew that", but the request is made nevertheless.

Lists sometimes are data structures occupying physical storage (cons cell, point to value, 
pointer to next...), sometimes generators (when asked for the next value, I'll answer 
this). I say a value is removed from a list regardless of whether it takes relinking 
pointers in a list of cons cells or it involves telling the generator not to give out that 
value. I could also say 'eliminated'. There are two attempts to eliminate 45.

>
> I don't see any problem here. As Melissa (and yourself, I think) have
> shown, double hits on multiples are few and far between.

It's not a problem, it just means it's not Euler's sieve, because that attempts to 
eliminate each composite exactly once.

>
> Also, it uses no filters, i.e. no individual number divisibility testing.
> The "filters" refers first of all to testing an individual number to decide
> whether to keep it in or not.

Umm, the postponed filters says keep everything until p^2, then eliminate (filter out, 
remove) multiples of p in the remainder, after that, pick next prime.
That's precisely what the above does. It doesn't do the filtering out by divisibility 
testing but by minus (hence more efficiently). I would say that's a variant of the 
postponed filters.

> Euler's sieve removes multiples in advance,
> so there's no testing and no filtering, only comparison. It follows the
> _Postponed_ Filter's framework in postponing the action until the right
> moment; the action itself is two lists comparison and skipping of the
> equals (i.e. the "minus" action).
>
> > Euler's sieve is never attempting to remove a number more than once,
> > that's
>
> How's that possible?

http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Euler.27s_Sieve

"Euler in his Proof of the Euler product formula for the Riemann zeta function came up 
with a version of the sieve of Eratosthenes, *better in the sense that each number was 
eliminated exactly once*."

C) The number after the previous prime is also a prime. *Multiply each number in the list 
starting from this prime by this prime and discard the products*.

> On wikipedia is says, it removes multiples of 3; then
> multiples of 5; etc. So it just looks for each number it is removing, if it
> is there, and if so, removes it. I would argue that looking whether to
> remove or not, is part of attempting to remove. It can't have foresight,
> right?
>

But it has :) By only trying to eliminates products of the prime p currently under 
consideration *with numbers (>= p) which have not yet been eliminated from the list*, it 
is known in advance that all these products are still in the list.

When p is under consideration, the list contains (apart from the primes < p) precisely the 
numbers whose smallest prime factor is >= p.

> > the outstanding feature of it. Unfortunately, that also makes it hard to
> > implement efficiently. The problem is that you can't forget the primes
> > between p and p^2, you must remember them for the removal of multiples of
> > p later on.
>
> you're right, every formulation of these algorithms is always done in the
> imperative, mutable-storage setting. They always speak of "removing"
> numbers etc.

sed s/remove/eliminate/ ?

>
> The more pressing problem with that code is its linear structure of course
> which gets addressed by the tree-folding merge etc.
>

Which unfortunately introduces its own space problem :(

> >  An (inefficient but faithful) implementation would be
> >
> > euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks)
>
> I think it does exactly the same thing, computationally,

No, in map (*p) ks, there are no numbers with a smaller prime factor than p, while in 
map (*p) [p, p+2 .. ] there are (for p > 3)

> except it does so,
> again,  _prematurely_  exactly in Turner's sieve's fashion - for each prime
> found, _as_ _soon_ as it is found. If it is faithful, so is my code. :)
>

It is faithful in that it tries to eliminate each composite exactly once.

Try a different minus:

xxs@(x:xs) `minus` yys@(y:ys)
   = case compare x y of
       LT -> x : xs `minus` yys
       EQ -> xs `minus` ys
       GT -> error ("trying to remove " ++ show y ++ " a second time")

Your code is not. It is, however, much faster.

> > Its performance is really horrible though.
>
> makes sense, as for Turner's.

It's far worse than Turner's. The "map (*p) ks" holds on to things really really long.
Measure it, just for teh lulz.

> See, the _when_ thing really helps to see
> what's going on here.
>
> > A much better implementation is
> >
> > primes = 2:3:euler hprs [] (scanl (+) 5 (cycle [2,4]))
> >   where
> >     hprs = 5:drop 3 primes
> >     euler (p:ps) acc cs = h ++ euler ps (tail (acc ++ h)) (t `minus`
> > comps) where
> >         (h,t) = span (< p*p) cs
> >         comps = map (*p) (acc ++ cs)
>
> this look like {2,3} wheel reimplemented and inlined.

The {2,3} wheel thing isn't important. It could equally well (well, a bit slower) be

primes = euler hprs [] [2 .. ]
   where
      hprs = 2:tail primes

It just occured to me that the accumulation list is just (takeWhile (< head cs) (p:ps)), 
so we could improve it as

euler (p:ps) cs = h ++ euler ps (t `minus` comps)
      where
        (h,t) = span (< p*p) cs
        comps = map (*p) (takeWhile (< head cs) (p:ps) ++ cs)

Still Bad.

> No point in improving
> anything until the linear structure isn't turned into a tree.
>
You're a tree hugger? :D
No, the point isn't linear vs. tree, it's that Euler's sieve is a) great for theoretical 
reasoning (look at a proof of Euler's product formula for the (Riemann) Zeta-function to 
appreciate it) b) reasonably good for sieving with a list of numbers written on paper, but 
c) not good to implement on a computer. For the computer, keeping track of which numbers 
are left is so much more complicated than just ticking off numbers a couple of times that 
it isn't even worth attempting it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100104/3538bf99/attachment.html