[Haskell-cafe] Simple FAST lazy functional primes

Mon Nov 2 02:41:15 EST 2009

First, here it is:

primes   = 2: 3: sieve 0 primes' 5
primes'  = tail primes
sieve k (p:ps) x
         = [x | x<-[x,x+2..p*p-2], and [(x`rem`p)/=0 | p<-take k primes']]
           ++ sieve (k+1) ps (p*p+2)

(thanks to Leon P.Smith for his brilliant idea of directly generating
the spans of odds instead of calling 'span' on the infinite odds list).

This code is faster than PriorityQueue-based sieve (granted, using my own
ad-hoc implementation, so YMMV) in producing the first million primes
(testing  was done running the ghc -O3 compiled code inside GHCi).
The relative performance vs the PQ-version is:

  100,000   300,000   1,000,000  primes produced
   0.6       0.75      0.97

I've recently came to the Melissa O'Neill's article (I know, I know) and
she makes the incredible claim there that the square-of-prime optimization
doesn't count if we want to improve the old and "lazy"

uprimes = 2: sieve [3,5..] where
  sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p > 0]

Her article gave me the strong impression that she claims that the only way
to fix this code is by using Priority Queue based optimization, and then
goes on to present astronomical gains in speed by implementing it.

Well, I find this claim incredible. First of all, the "naive" code fires up
its nested filters much too early, when in fact each must be started only
when the prime's square is reached (not only have its work started there -
be started there itself!), so that filters are delayed:

dprimes  = 2: 3: sieve (tail dprimes) [5,7..]  where
  sieve (p:ps) xs
         = h ++ sieve ps (filter ((/=0).(`rem`p)) (tail t))
           where (h,t)=span (< p*p) xs

This code right there is on par with the PQ-base code, only x3-4 slower at
generating the first million primes.

Second, the implicit control structure of linearly nested filters, piling up
in front of the supply stream, each with its prime-number-it-filters-by
hidden inside it, needs to be explicated into a data structure of
primes-to-filter-by (which is in fact the prefix of the primes list
we're building itself), so the filtering could be done by one function call
instead of many nested calls:

xprimes  = 2: 3: sieve 0 primes' [5,7..]  where
  primes' = tail xprimes
  sieve k (p:ps) xs
           = noDivs k h ++ sieve (k+1) ps (tail t)
             where (h,t)=span (< p*p) xs
  noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))

>From here, using the brilliant idea of Leon P. Smith's of fusing the span
and the infinite odds supply, we arrive at the final version,

kprimes  = 2: 3: sieve 0 primes' 5  where
  primes' = tail kprimes
  sieve k (p:ps) x
           = noDivs k h ++ sieve (k+1) ps (t+2)
             where (h,t)=([x,x+2..t-2], p*p)
  noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))

Using the list comprehension syntax, it turns out, when compiled with
ghc -O3, gives it another 5-10% speedup itself.

So I take it to disprove the central premise of the article, and to show
that simple lazy functional FAST primes code does in fact exist, and
that the PQ optimization - which value of course no-one can dispute - is
a far-end optimization.