[Haskell-cafe] Re: Genuine Eratosthenes sieve [Was: Optimization fun]

apfelmus at quantentunnel.de apfelmus at quantentunnel.de
Mon Feb 19 16:13:24 EST 2007


Melissa O'Neill wrote:
> FWIW, another way to write this code (without needing to think about how
> "fold bails early") is
> 
> primes = 2: [x | x <−[3,5..], all (\p −> x `mod` p > 0) (factorsToTry x)]
>   where
>     factorsToTry x = takeWhile (\p −> p*p <= x) primes

Well, you certainly thought of the fact that 'all' bails early :)

> [...]
> But that's okay, because our initial algorithm ISN'T THE REAL SIEVE
> EITHER.
> [...]
> The thread ends with this algorithm from Yitzchak Gale:
>> [...]
>> sieve (x:xs) = x : sieve (deleteOrd [x+x,x+x+x..] xs)
>> sieve _      = []
>
> Which seems reasonable, until you realize that every prime p
> we come up with is still "considered" by k deleteOrd filters,
> where k is the number of primes that preceeded it.
> [...]
> What makes the sieve an efficient algorithm are the details of *what*
> gets "crossed off", *when*, and *how many times*.

Ah, thank you for that fresh slap in the face. Somehow, the name 'sieve'
clouded the fact that the point of a sieve is how to represent it as
data structure in order to avoid too many multiple crosses. Iterating
'deleteOrd' organizes the "cross off" in linear time whereas logarithmic
time is clearly conceivable. Though I think that the `mod`-algorithm
counts as sieve, it just does a bad job at organizing the "crossing off".

> The algorithm is efficient because each composite number, c, gets
> crossed off f times, where f is the number of unique factors of c less
> than sqrt(c). The average value for f increases slowly, being less than
> 3 for the first 10^12 composites, and less than 4 for the first 10^34.

This analysis misses the fact that managing the "cross off table"
invariably needs log(#primes <= c) ~ log(c) extra time for every c. I'm
not quite sure, but I think that this has to be multiplied (not added)
with f. The other algorithms don't have an extra factor, although f is
clearly larger for them.


Regards,
apfelmus



More information about the Haskell-Cafe mailing list