[Haskell-cafe] Re: Genuine Eratosthenes sieve [Was: Optimization fun]

Mon Feb 19 13:33:36 EST 2007

You took my quote entirely out of context. In particular, you omitted
the rest of the sentence "but I'm sure that day will come." My
statement was no excuse by any stretch of the imagination--I was
initially confused when I saw it in your post and then a bit offended.

The original intent of my statement was as a preface to the fact that
I enjoyed reading the discussion and appreciated everyone's
involvement. On that note, thanks for participating.

I recognize the particular "excuse" you intended my quote to
represent, but I think it was inappropriate to chop my quote to suit
your needs--it makes me seem short-sighted in the eyes of the
community when, ironically enough, it was actually part of a
prediction that I will need performance from my Haskell someday.

I do feel a little defamed, but I don't want to go off topic on the
list, especially for personal issues. Just try to consider the affect
it will have on others when looking for quotes in the future.

Thanks and no worries,
Nick

On 2/19/07, Melissa O'Neill <oneill at cs.hmc.edu> wrote:
> Sorry, I'm a little late to this thread, but the topic of
> > sieve [] = []
> > sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0]
> (as posted by Rafael Almeida) is somewhat dear to my heart, as I
> wrote a paper about it last summer and sent it in to JFP.  Still
> waiting for a reply though.
>
> Let's go back to the beginning with the classic complaint and the
> excuses...
>
> Creighton Hogg wrote:
> > So a friend and I were thinking about making code faster in
> > Haskell, and I was wondering if there was a way to improve the
> > following method of generating the list of all prime numbers.  It
> > takes about 13 seconds to run, meanwhile my friend's C version took
> > 0.1.
>
> Here come the excuses, like this one from apfelmus,
> > While Haskell makes it possible to express very complicated
> > algorithms in simple and elegant ways, you have to expect to pay a
> > constant factor (roughly 2x-10x) when competing against the same
> > algorithm in low-level C.
> and this one from Nicolas Frisby,
> > I have yet to need my Haskell to perform well
>
> Matthew Brecknell came up with something much faster, namely
> > primes :: [Int]
> > primes = 2 : filter isPrime [3,5..] where
> >   f x p r = x < p*p || mod x p /= 0 && r
> >   isPrime x = foldr (f x) True primes
>
> FWIW, another way to write this code (without needing to think about
> how "fold bails early") is
>
> primes = 2: [x | x <−[3,5..], all (\p −> x `mod` p > 0)
> (factorsToTry x)]
>    where
>      factorsToTry x = takeWhile (\p −> p*p <= x) primes
>
> Both of these algorithms best the "sieve" we began with and run
> quickly, but as you can see (possibly more clearly from my
> rephrasing), this algorithm is not actually the Sieve of Eratosthenes
> -- it's actually a classic naive primes algorithm which checks a
> number for primality by trying to divide it by every prime up to its
> square root.
>
> But that's okay, because our initial algorithm ISN'T THE REAL SIEVE
> EITHER.  Markus Fischmann hits the nail on the head when he says
> > The characteristics of a sieve is, that it uses the already found
> > primes to generate a list of non-primes that is then removed from a
> > list of candiates for primeness.
>
> But then we get distracted by a discussion about avoiding division.
> It's true that the real sieve avoids division, but it is NOT true
> that every algorithm that avoids division is the sieve.  The thread
> ends with this algorithm from Yitzchak Gale:
> > -- Delete the elements of the first list from the second list,
> > -- where both lists are assumed sorted and without repetition.
> > deleteOrd :: Ord a => [a] -> [a] -> [a]
> > deleteOrd xs@(x:xs') ys@(y:ys')
> >   | x > y       = y : deleteOrd xs  ys'
> >   | x < y       =     deleteOrd xs' ys
> >   | otherwise   =     deleteOrd xs' ys'
> > deleteOrd _ ys = ys
> >
> > sieve (x:xs) = x : sieve (deleteOrd [x+x,x+x+x..] xs)
> > sieve _      = []
> >
> > primes = sieve [2..]
>
> Which seems reasonable, until you realize that every prime p we come
> up with is still "considered" by k deleteOrd filters, where k is the
> number of primes that preceeded it.
>
> So, let's recap: the original algorithm is beautiful and simple, but
> it is NOT the actual Sieve of Eratosthenes, NOT because it uses
> modulus, but because fundamentally, at the highest level, it is a
> different algorithm.   At the risk of beating a dead horse, let's see
> why it's not the real sieve.
>
> What makes the sieve an efficient algorithm are the details of *what*
> gets "crossed off", *when*, and *how many times*. Suppose that we are
> finding the first 100 primes (i.e., 2 through 541), and have just
> discovered that 17 is prime. We will begin crossing off at 289 (i.e.,
> 17 * 17) and cross off the multiples 289, 306, 323, ... , 510, 527,
> making fifteen crossings off in total. Notice that we cross off 306
> (17 * 18), even though it is a multiple of both 2 and 3 and has thus
> already been crossed off twice.  (The starting point of 17 * 17 is a
> pleasing, but actually *minor*, optimization for the *genuine* sieve.)
>
> The algorithm is efficient because each composite number, c, gets
> crossed off f times, where f is the number of unique factors of c
> less than sqrt(c). The average value for f increases slowly, being
> less than 3 for the first 10^12 composites, and less than 4 for the
> first 10^34.
>
> None of the algorithms we've discussed correspond to the above
> algorithm. It is not merely that they don't perform "optimizations",
> such as starting at the square of the prime, or that some of then use
> a divisibility check rather than using a simple increment. Rather, at
> a fundamental level, they all work differently than the real sieve.
> Following our earlier example, after finding that 17 is prime, the
> "phony" algorithm will check to see if 19 is divisible by 17 (in the
> case of Yitzchak's algorithm, divisibility is checked by comparing
> against 17*2), followed by 23, 29, 31, ... , 523, 527, checking a
> total of ninety-nine numbers for divisibility by 17. In fact, even if
> it did (somehow) begin at 289, it would examine all forty-five
> numbers that are not multiples of the primes prior to 17 (289, 293,
> 307, ... , 523, 527). It is also worth realizing that the set of
> numbers examined by the phony algorithm is not a superset of the ones
> examined by the real sieve (as we saw, the real algorithm crossed off
> 306, which the phony algorithm skips). In general, the speed of the
> phony algorithm depends on the number of primes that are *not*
> factors of each composite, whereas the speed of the real algorithm
> depends on the number of (unique) primes that *are*.
>
> Hopefully at this point I've piqued your interest.  Maybe you're
> wondering what our original sieve algorithm is, if it isn't the real
> sieve and the mod issue is just a red herring...?  Maybe you're
> wondering what the the real sieve looks like, coded in Haskell.   Or
> maybe you're still not convinced about any of it.  If so, I invite
> you to read a draft of my JFP paper, available at:
>
>      http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf
>
> Best Regards,
>
>      Melissa.
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>