[Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

Sun Feb 25 07:51:42 EST 2007

For those enjoying the fun with prime finding, I've updated the  
source at

     http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip

I've tweaked my code a little to improve its space behavior when  
finding primes up to some limit, added an up-to-limit version of the  
Naive Primes algorithm, and added Oleg's prime finding code too.

I also got a chance to look at space usage more generally.  I won't  
reproduce a table here, but the conclusions were more-or-less what  
you'd expect.  The "unlimited list" algorithms used O(n) space to  
find n primes (except for Runciman's algorithm, which appeared to be  
much worse), and the "primes up to a limit" algorithms used O(sqrt 
(n)) space to find the nth prime.

Both of these are better than the classic C algorithm, which uses O(n  
log n) space to find the nth prime.  For example, heap profiling  
shows that my own O(sqrt(n)) algorithm uses only 91200 bytes to find  
the 10^7th prime, whereas the classic C algorithm needs at least  
11214043 bytes for its array -- a factor of more than 100 different,  
and one that gets worse for larger n.

Lennart Augustsson wrote:
> Another weird thing is that much of the Haskell code seems to work  
> with Integer whereas the C code uses int.

Originally, I was comparing Haskell with Haskell, and for that  
purpose I wanted to have a level playing field, so going with Integer  
everywhere made sense.

> That doesn't seem fair.

Actually, to the extent that any of the comparisons are "fair", I  
think this one is too.  After all, typical Haskell code uses Integer  
and typical C code uses int.  I could use arrays in my Haskell code  
and never use laziness, but when I program in Haskell, I'm not trying  
to exactly recreate C programs, but rather write their Haskell  
equivalents.  For example, to me, producing a lazy list was essential  
for a true Haskell feel.  For some people, the "Haskell feel" also  
includes treating the language as a declarative specification  
language where brevity is everything -- but for me, other things  
(like fundamental algorithmic efficiency and faithfulness to the core  
ideas that make the Sieve of Eratosthenes an *efficient* algorithm)  
are universal and ought to be common to both C and Haskell versions.

But to allow a better comparison with C, I've added a run for an Int  
version of my algorithm.  With that change, my code is closer to the  
speed of the C code.  More interestingly, for larger n, I seem to be  
narrowing the gap.  At 10^6, my code runs nearly 30 times slower than  
the classic C version, but at 10^8, I'm only about 20 times slower.   
This is especially interesting to me there was some (reasonable  
looking) speculation from apfelmus several days ago, that suggested  
that my use of a priority queue incurred an extra log(n) overhead,  
from which you would expect a worse asymptotic complexity, not  
equivalent or better.

     Melissa.

Enc. (best viewed with a fixed-width font)

    ------------------------------------------------------------------
                  Time (in seconds) for Number of Primes
                  ----------------------------------------------------
    Algorithm     10^3    10^4     10^5     10^6     10^7     10^8
    ------------------------------------------------------------------
    C-Sieve       0.00      0.00     0.01     0.29      5.12    88.24
    O'Neill (#3)  0.01      0.04     0.55     8.34    122.62  1779.18
    O'Neill (#2)  0.01      0.06     0.95    13.85    194.96  2699.61
    O'Neill (#1)  0.01      0.07     1.07    15.95    230.11     -
    Bromage       0.02      0.39     6.50   142.85     -         -
    "sieve" (#3)  0.01      0.25     7.28   213.19     -         -
    Naive (#2)    0.02      0.59    14.70   386.40     -         -
    Naive (#1)    0.32      0.66    16.04   419.22     -         -
    Runciman      0.02      0.74    29.25    -         -         -
    Reinke        0.04      1.21    41.00    -         -         -
    Zilibowitz    0.02      2.50   368.33    -         -         -
    Gale (#1)     0.12     17.99    -        -         -         -
    "sieve" (#1)  0.16     32.59    -        -         -         -
    "sieve" (#2)  0.01     32.76    -        -         -         -
    Oleg          0.18     68.40    -        -         -         -
    Gale (#2)     1.36    268.65    -        -         -         -
    ------------------------------------------------------------------

- The dashes in the table mean "I gave up waiting" (i.e., > 500 seconds)
- "sieve" (#1) is the classic example we're all familiar with
- "sieve" (#2) is the classic example, but sieving a list without  
multiples of 2,3,5, or 7 -- notice how it makes no real difference
- "sieve" (#3) is the classic example, but generating a lazy-but- 
finite list (see below)
- O'Neill (#1) is basically the algorithm of mine discussed in http:// 
www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf, with a few minor tweaks
- O'Neill (#2) is a variant of that algorithm that generates a lazy- 
but-finite list of primes.
- O'Neill (#3) is a variant of that algoritm that uses Ints when it  
can get away with it.
- Naive (#1) is the the non-sieve-based "divide by every prime up to  
the square root" algorithm for finding primes (called SimplePrimes in  
the source)
- Naive (#2) is the same algorithm, with a limit on the number of primes
- Runciman is Colin Runciman's algorithm, from his _Lazy Wheel Sieves  
and Spirals of Primes_ paper
- Reinke is the ``applyAt'' algorithm Claus Reinke posted here
- Gale (#1) is Yitz Gale's deleteOrd algorithm
- Gale (#2) is Yitz Gale's crossOff algorithm
- Oleg is oleg at pobox.com's algoirthm, as posted to Haskell Cafe
- Zilibowitz is Ruben Zilibowitz's GCD-based primes generator, as  
posted on Haskell-Cafe
- Bromage is Andrew Bromage's implementation of the Atkin-Bernstein  
sieve.  Like O'Neill (#2) and "sieve" (#3), asks for some upper limit  
on the number of primes it generates.  Unlike O'Neill (#2) and  
"sieve" (#3), it uses arrays, and the upper limit causes a large  
initial array allocation.  Also, unlike the other Haskell algorithms,  
it does not produce a lazy list; no output is produced until sieving  
is complete
- C-Sieve is a "typical" simple implementation of the sieve in C  
found with Google; it skips multiples of 2 and uses a bit array.   
Also, obviously, it doesn't produce incremental output.