[Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

Sun Feb 25 12:18:49 EST 2007

Here's another program you can add.  It's fairly short and efficient.

	-- Lennart

import System (getArgs)

infixr :>

data StreamInt = !Int :> StreamInt

(!>) :: StreamInt -> Int -> Int
(x :> _)  !> 0 = x
(_ :> xs) !> n = xs !> (n-1)

-- By replacing lprimes on the next line by '5 :> gen 7 4 2' this  
algorithm
-- runs in very little space, but is somewhat slower.
primes = 2 :> 3 :> lprimes
   where isPrime (p:>ps) n = n `rem` p /= 0 && (p*p > n || isPrime ps n)
         lprimes = 5 :> gen 7 4 2
         gen n a b = if isPrime lprimes n then n :> gen (n+a) b a  
else gen (n+a) b a

printNthPrime n = print (n, primes !> (n-1))

main = do
     args <- getArgs
     printNthPrime $ read $ head args

On Feb 25, 2007, at 12:51 , Melissa O'Neill wrote:

> For those enjoying the fun with prime finding, I've updated the  
> source at
>
>     http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip
>
> I've tweaked my code a little to improve its space behavior when  
> finding primes up to some limit, added an up-to-limit version of  
> the Naive Primes algorithm, and added Oleg's prime finding code too.
>
> I also got a chance to look at space usage more generally.  I won't  
> reproduce a table here, but the conclusions were more-or-less what  
> you'd expect.  The "unlimited list" algorithms used O(n) space to  
> find n primes (except for Runciman's algorithm, which appeared to  
> be much worse), and the "primes up to a limit" algorithms used O 
> (sqrt(n)) space to find the nth prime.
>
> Both of these are better than the classic C algorithm, which uses O 
> (n log n) space to find the nth prime.  For example, heap profiling  
> shows that my own O(sqrt(n)) algorithm uses only 91200 bytes to  
> find the 10^7th prime, whereas the classic C algorithm needs at  
> least 11214043 bytes for its array -- a factor of more than 100  
> different, and one that gets worse for larger n.
>
> Lennart Augustsson wrote:
>> Another weird thing is that much of the Haskell code seems to work  
>> with Integer whereas the C code uses int.
>
> Originally, I was comparing Haskell with Haskell, and for that  
> purpose I wanted to have a level playing field, so going with  
> Integer everywhere made sense.
>
>> That doesn't seem fair.
>
> Actually, to the extent that any of the comparisons are "fair", I  
> think this one is too.  After all, typical Haskell code uses  
> Integer and typical C code uses int.  I could use arrays in my  
> Haskell code and never use laziness, but when I program in Haskell,  
> I'm not trying to exactly recreate C programs, but rather write  
> their Haskell equivalents.  For example, to me, producing a lazy  
> list was essential for a true Haskell feel.  For some people, the  
> "Haskell feel" also includes treating the language as a declarative  
> specification language where brevity is everything -- but for me,  
> other things (like fundamental algorithmic efficiency and  
> faithfulness to the core ideas that make the Sieve of Eratosthenes  
> an *efficient* algorithm) are universal and ought to be common to  
> both C and Haskell versions.
>
> But to allow a better comparison with C, I've added a run for an  
> Int version of my algorithm.  With that change, my code is closer  
> to the speed of the C code.  More interestingly, for larger n, I  
> seem to be narrowing the gap.  At 10^6, my code runs nearly 30  
> times slower than the classic C version, but at 10^8, I'm only  
> about 20 times slower.  This is especially interesting to me there  
> was some (reasonable looking) speculation from apfelmus several  
> days ago, that suggested that my use of a priority queue incurred  
> an extra log(n) overhead, from which you would expect a worse  
> asymptotic complexity, not equivalent or better.
>
>     Melissa.
>
> Enc. (best viewed with a fixed-width font)
>
>    ------------------------------------------------------------------
>                  Time (in seconds) for Number of Primes
>                  ----------------------------------------------------
>    Algorithm     10^3    10^4     10^5     10^6     10^7     10^8
>    ------------------------------------------------------------------
>    C-Sieve       0.00      0.00     0.01     0.29      5.12    88.24
>    O'Neill (#3)  0.01      0.04     0.55     8.34    122.62  1779.18
>    O'Neill (#2)  0.01      0.06     0.95    13.85    194.96  2699.61
>    O'Neill (#1)  0.01      0.07     1.07    15.95    230.11     -
>    Bromage       0.02      0.39     6.50   142.85     -         -
>    "sieve" (#3)  0.01      0.25     7.28   213.19     -         -
>    Naive (#2)    0.02      0.59    14.70   386.40     -         -
>    Naive (#1)    0.32      0.66    16.04   419.22     -         -
>    Runciman      0.02      0.74    29.25    -         -         -
>    Reinke        0.04      1.21    41.00    -         -         -
>    Zilibowitz    0.02      2.50   368.33    -         -         -
>    Gale (#1)     0.12     17.99    -        -         -         -
>    "sieve" (#1)  0.16     32.59    -        -         -         -
>    "sieve" (#2)  0.01     32.76    -        -         -         -
>    Oleg          0.18     68.40    -        -         -         -
>    Gale (#2)     1.36    268.65    -        -         -         -
>    ------------------------------------------------------------------
>
> - The dashes in the table mean "I gave up waiting" (i.e., > 500  
> seconds)
> - "sieve" (#1) is the classic example we're all familiar with
> - "sieve" (#2) is the classic example, but sieving a list without  
> multiples of 2,3,5, or 7 -- notice how it makes no real difference
> - "sieve" (#3) is the classic example, but generating a lazy-but- 
> finite list (see below)
> - O'Neill (#1) is basically the algorithm of mine discussed in  
> http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf, with a few  
> minor tweaks
> - O'Neill (#2) is a variant of that algorithm that generates a lazy- 
> but-finite list of primes.
> - O'Neill (#3) is a variant of that algoritm that uses Ints when it  
> can get away with it.
> - Naive (#1) is the the non-sieve-based "divide by every prime up  
> to the square root" algorithm for finding primes (called  
> SimplePrimes in the source)
> - Naive (#2) is the same algorithm, with a limit on the number of  
> primes
> - Runciman is Colin Runciman's algorithm, from his _Lazy Wheel  
> Sieves and Spirals of Primes_ paper
> - Reinke is the ``applyAt'' algorithm Claus Reinke posted here
> - Gale (#1) is Yitz Gale's deleteOrd algorithm
> - Gale (#2) is Yitz Gale's crossOff algorithm
> - Oleg is oleg at pobox.com's algoirthm, as posted to Haskell Cafe
> - Zilibowitz is Ruben Zilibowitz's GCD-based primes generator, as  
> posted on Haskell-Cafe
> - Bromage is Andrew Bromage's implementation of the Atkin-Bernstein  
> sieve.  Like O'Neill (#2) and "sieve" (#3), asks for some upper  
> limit on the number of primes it generates.  Unlike O'Neill (#2)  
> and "sieve" (#3), it uses arrays, and the upper limit causes a  
> large initial array allocation.  Also, unlike the other Haskell  
> algorithms, it does not produce a lazy list; no output is produced  
> until sieving is complete
> - C-Sieve is a "typical" simple implementation of the sieve in C  
> found with Google; it skips multiples of 2 and uses a bit array.   
> Also, obviously, it doesn't produce incremental output.
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe