[Haskell-cafe] Re: Code and Perf. Data for
PrimeFinders(was:Genuine Eratosthenes sieve)
Thorkil Naur
naur at post11.tele.dk
Sat Mar 3 13:59:57 EST 2007
Hello all felllow primefinders,
I have followed this discussion of Haskell programs to generate small primes
with the greatest interest. The thing is, I have been running my Haskell
implementation of the so-called Elliptic Curve Method (ECM) of factoring for
a number of years. And that method uses a sequence of small primes, precisely
like the one discussed here.
I have been using the method of trial division to generate primes for the ECM.
I have been aware that sieving would very likely be more efficient, but since
the amount of time spent generating small primes is, after all, not an
extremely large part of running the ECM program, I have postponed the task of
looking into this. So this discussion of Eratostenes' sieve has provided a
most wonderful opportunity to do something about this.
Initially, I wanted to establish some sort of base line for generating small
primes. This I did by writing my own C program for generating small primes,
both using sieving and trial division.
The result was most sobering: Finding primes with sieving with a C program is
much faster than finding them with trial division. The speed advantage factor
of sieving over trial division for small numbers (say, finding the first
couple of hundred thousand primes) is around 5 and it gets larger with larger
numbers. (I will not quote specific measurements here, only rough tendencies.
Details clutter and Melissa is much better at this anyway. I would like to
state, however, that I am, in fact, measuring performance accurately, but I
use ancient machinery that runs slowly: Measuring performance on fast
equipment often tends to cloud significant issues, in my experience.)
So there is my goal now: Find a Haskell program that generates small primes
that is around 5 times faster than the one I have already that uses trial
division. And I mean a Haskell program, not a C program that gets called from
Haskell. Sure, if I wanted ultimate performance, I could perhaps just call
some C code from Haskell. But the insistance on pure Haskell widens the
perspective and makes it more likely that the programmer or the language
designer and implementer learn something valuable.
The C sieving program uses an array to attain its impressive performance, so
let us try that in Haskell as well. Something like this has been on my mind
for a long time:
accumArray (\x ->\y ->False) True (m,n) [(q,()) | q<-ps ]
Haskell arrays, like every other value, cannot be changed, but accumArray
provides a facility that we just might manage to use for our purpose. The
particular array computed by the above expression using accumArray requires a
low and high limit (m,n) of the numbers to be sieved as well as a list of
sieving values - ps - numbers that need to be crossed out as composite in the
interval (m,n) given.
[The overhead involved in evaluating this expression with its unused () value
and an indicator that is changed from True to False through throwing away two
dummy values seems excessive. Any suggestion to write this up in a way that
is more in agreement with the small amount of work that is actually required
to do this would be most welcome. Or perhaps GHC (which is the compiler I use
when I want things to run fast) takes care of things for me? In any case,
this is my inner loop and even small improvements will matter.]
When I tried this initially, I was actually not particularly surprised to find
that it performed rather worse than trial division. Although I am unable to
give a precise explanation, the GHC profiling runs indicated that, somehow,
too much memory was accumulating before it was used, leading to thrashing and
perhaps also a significant amount of overhead spent administering this
excessive amount of memory.
The initial sieve that I used had a fixed size that covered all the numbers
that I wanted to test. An improvement might come about by splitting this
sieve into smaller pieces that were thrown away when no longer needed.
OK, what pieces? Well, a possibility would be to use the splitting of all
integers into subsequences that matched the interval between squares of
consecutive primes, because that would match the entry of a new prime into
the set of primes that needs to be taken into consideration.
And this actually worked rather well. In addition, the problem solved was now
the one of expressing the generation of an infinite sequence of primes, not
just the primes within some predetermined interval.
Further (yes, you will get the code eventually, but be patient, I don't want
to have to present more than the final version), there was the wheel idea
also used by Melissa of somehow eliminating from consideration all the
numbers divisible by a few of the smallest primes: If we, for example, could
get away with considering only the odd numbers as candidates for being
primes, this would potentially save half the work.
This I didn't find to be an easy idea to implement. Essentially what I did
was, both for the sequence of candidates and for the sequence of potential
divisors, changing the representation into one in which consecutive numbers
represented only the numbers that didn't have some basic set of primes as
divisors.
An example may help here. Let's say that we have selected 2 and 3 as factors
to be eliminated from consideration. Then we have the numbers 1, 5, 7, 11,
13, ... that are neither divisible by 2 nor 3. And we establish a
correspondence 0<->1, 1<->5, 2<->7, 3<->11, 4<->13, ... so that every number
not divisible by 2 or 3 has precisely one representaton in [0..]. I call this
the wheel representation. So, once we have emitted the special primes 2 and
3, we can limit our attention to the numbers whose wheel represention is the
list [1..].
The situation becomes more complicated when considering the effective
generation of the sequence of numbers that we use to cross off the numbers in
this basic list. Taking the next prime 5, for example, we have to generate
the list of divisors [25,30,..] in the wheel representation. This task is
made more complicated by the fact that some of these numbers (like 30) should
first be eliminated, because they are divisible by 2 or 3. The end result is
a finite list of differences which can be repeated endlessly to generate the
actual divisors in the wheel representation.
Then there is the use of specialization, something that GHC can use to
generate more efficient code for special instances of generic functions. The
way I understand this, GHC can generate a special Int version of code that
otherwise handles the general Integral class. But in order for this code to
be selected for use by GHC, it must be called under circumstances where it is
clear that the involved type is really Int. To capture the full potential of
this seems to require some special handling in general.
My trial division code was speeded up by a factor of about 5 using this
technique. For the present sieving code, the factor was only about 2. But I
believe that additional opportunities for speeding up using this technique
remain.
A final complication concerns the need to be able to specify a lower limit on
the list of primes generated. This is for possible use of this code to
generate primes for the ECM.
Enough talk, the code can be found at
thorkilnaur.dk/~tn/T64_20070303_1819.tar.gz
and includes both the C code that I have used (ErathoS.c), a Haskell module
ErathoS (ErathoS.hs), a simple driver for ErathoS (T64.hs), and a Shell
script that demonstrates the basic possibilities (T64.sh).
Overall, the speed compares well with the fastest of the sieves that we have
seen in this discussion. And I am able to use sieving in Haskell, as in C, to
gain a speed advantage over trial division. The Haskell code is still
considerably slower than the C code in spite of the fact that the C code has
been written mostly with an attention to correctness and could probably be
improved rather easily.
Best regards
Thorkil
More information about the Haskell-Cafe
mailing list