[Haskell-cafe] Re: FASTER primes
Heinrich Apfelmus
apfelmus at quantentunnel.de
Mon Jan 4 08:29:19 EST 2010
Will Ness wrote:
>
> I keep thinking that storage duplication with span, filter etc. is not really
> necessary, and can be replaced with some pointer chasing - especially when
> there's only one consumer present for the generated values.
>
> What I mean is thinking of lists in terms of produce/consumer paradigm, as
> objects supporting the { pull, peek } interface, keeping the generator inside
> that would produce the next value on 'pull' request and keep it ready for
> any 'peek's.
>
> Euler's sieve is
>
> sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..])
> where (h,t) = span (< p*p) xs
>
> [...]
>
> The real difference here is between those producers whose values will only be
> consumed once, by one specific consumer, and those which values may be needed
> more than once, so need really to be maintained in some storage. If not - span,
> filter, map, whatever - they all are just little modifiers on top of the real
> producers, which may or may not also have an actual storage maintained by them.
(I haven't followed the whole thread, but hopefully I have enough grasp
of it to make a useful remark. :))
Concerning lists as producer/consumer, I think that's exactly what lazy
evaluation is doing. Neither filter , map or span evaluate and
store more list elements that strictly necessary.
Sure, creating a list head only to immediately consume it is somewhat
inefficient -- and the target of stream fusion[1] -- but this is an
overhead of how list elements are stored, not how many.
You can try to implement the Euler sieve with producers by using a type like
data Producer a = forall s. Producer {
state :: !s, next :: s -> s, value :: s -> a }
but I think this will be quite difficult; it's not clear what and thus
how big the state will be. (See [1] for choosing a good type.)
Concerning the sieves, there is a fundamental difference between the
imperative sieve and the functional sieves, regardless of whether the
latter start at p or p^2 or use a priority queue. Namely, the imperative
sieve makes essential use of *pointer arithmetic*. The key point is that
in order to cross off the multiples
p, 2*p, 3*p, ...
of a prime, the algorithm can directly jump from the (k*p)-th to the
(k*p+p)-th array element by adding p to the index. The functional
versions can never beat that because they can't just jump over p
constructors of a data structure in O(1) time.
[1]: http://www.cse.unsw.edu.au/~dons/papers/CLS07.html
Regards,
Heinrich Apfelmus
--
http://apfelmus.nfshost.com
More information about the Haskell-Cafe
mailing list