[Haskell-cafe] Re: Code and Perf. Data for Prime
Finders(was:Genuine Eratosthenes sieve)
claus.reinke at talk21.com
Sun Feb 25 20:49:28 EST 2007
> This isn't a variant of the "folklore sieve", it's actually the real one!
well, one thing i was trying for (and what perhaps one might like to see in a
paper), is a way of relating the various code alternatives to each other, with
suitably similar intermediate stages. not quite as detailed as deriving one from
the other by program transformation, and not always meaning-preserving, but
small enough steps that it becomes easier to understand the differences, and
how one version might grow out of another, based on which performance
concerns and implementation decisions. your paper does some of that, but
the steps feel more like jumps to me.
we don't usually have compiler optimisations that could effectively transform
the folklore version of the sieve into any of the other prime enumerators we've
seen. and it is nice to see sieve added to the infamous quicksort as an example
of a well-known to be beautiful, but little-known to be inefficiently realised
specification (Lennart provided a convincing alternative for that other one, too;)
- they are advertised and accepted unquestioned all too often. but something in
me balks at the idea that the folkore version of the sieve is not "the" sieve
*algorithm*, in spite of your detailed cross-off footprint and performance
i'm not saying you're wrong, and as one of those who prefer operational semantics
over denotational semantics in many contexts, i realise that it may be seen as odd
if i prefer to see the high-level versions as specifications of algorithms, when the
implementation behaviour might differ substantially from the intended algorithm.
but we are in the grey area between "permute until sorted" and "sort like this",
and thinking of the modulus of a number as a static property (that might be
derived incrementally from that of its predecessor) rather than an explicit check
for divisibility looks very similar to "this is quicksort, but i didn't really mean
(++) to do appends". if one has to use quicksort on binary lists, one better
uses Augustsson's append-free qsort instead of the folklore version, but is that
a better implementation, or a different algorithm? does it depend on how we
look at the question?
if you could make that part of your argument in a less controversial style, without
loaded words like "real", "phony", "fooling", focussing more on observations than
on any agenda (however innocent), this reader at least would find it easier to focus
on the really interesting issues you raise for discussion. also, the present thread
(one of many incarnations) demonstrates that there are other issues to be
exposed and explored when discussing styles of prime sieves. not to mention
the modern-again issue of transforming executable specifications to efficient
mod sieves (folklore)
->start (sieve p) at p*p
->incremental sieves (folklore2)
->merged incremental sieves (folklore3)
->more efficient representation of merged sieves (your priority queue version)
->other useful optimisations and tweaks that further hide the ideas (your fastest
makes it easier for me to see how the initial program relates to the others, and the
closeness of folklore3 to one of your versions is intentional, as are the differences.
the versions are not quite equivalent (eg folklore2/3 go wrong earlier than
folklore when using Int instead of Integer), but if there is any difference wrt
the cross-out footprint (apart from the p*p thing), it is likely to be in the precise
organisation of sieve merging after folklore2. otherwise, they seem to embody
fairly natural improvement steps of the original specification-level code.
> Let's take a look at ``pns'' at the 20th prime, having just
> found that 67 is prime (we're about to discover that 71 is prime):
> As you can see, 70 will be crossed off twice, once by the 5 iterator
> and once by the iterator for 7. And we won't think about 11, 13,
> etc. until they're needed.
as it happens, even if we run folklore3 over [2..], 70 will be crossed off
exactly once, by the sieve for 2. the sieves for 5 and 7 run over the gap
left behind, as they do in folklore and folklore2 (one could move that
gap jumping from sieve3 to insert, though..). the sieves for 5 and 7
"know" about 70 the same way that (`mod`5) and (`mod`7) know
about 70, but that knowledge isn't called upon for sieving, only to find
greater numbers with modulus 0.
in contrast, sieves in genuine will replace non-primes by zero, so later
sieves can still run into the same position. whether we count the zero in
that position as a gap, to be left intact, or as a ghost of the non-prime, to
be zeroed again, would seem to be a matter of viewpoint?
for me, the question is more one of vertical vs horizontal, or from-scratch
vs incremental, or level of abstraction. if i have a list of numbers
list = [0..20]
do i compute their moduli from scratch, starting recursions orthogonal
to the list direction
map (\x->(x,x`mod` 3)) list
or do i compute them incrementally, aligning my recursion with the list
zip list (cycle [0..2])
do i see the former as associating the list elements with their moduli, or
as testing them for divisibility? and is the latter the same as the former,
or something different? it is the same thing, computed differently, of course.
but if i use that thing as a component of describing a more complex
algorithm, am i talking about different algorithms, or about the same
algorithm, computed differently? does it depend on our point of view,
or must we talk in absolutes?
More information about the Haskell-Cafe