[Haskell-cafe] Re: FASTER primes

Mon Jan 4 10:30:18 EST 2010

Heinrich Apfelmus <apfelmus <at> quantentunnel.de> writes:

> 
> Will Ness wrote:
> >
> > I keep thinking that storage duplication with span, filter etc. is not
> > really 
> > necessary, and can be replaced with some pointer chasing - especially when 
> > there's only one consumer present for the generated values. 
> > 
> > What I mean is thinking of lists in terms of produce/consumer paradigm, as 
> > objects supporting the { pull, peek } interface, keeping the generator
> > inside that would produce the next value on 'pull' request and keep it
> > ready for any 'peek's.
> > 
> > Euler's sieve is
> > 
> >  sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..])
> >                       where (h,t) = span (< p*p) xs
> >
> > [...]
> >
> > The real difference here is between those producers whose values will
> > only be consumed once, by one specific consumer, and those which values
> > may be needed more than once, so need really to be maintained in some
> > storage. If not - span, filter, map, whatever - they all are just little
> > modifiers on top of the real producers, which may or may not also have 
> > an actual storage maintained by them.
> 
> (I haven't followed the whole thread, but hopefully I have enough grasp
> of it to make a useful remark. :))
> 
> Concerning lists as producer/consumer, I think that's exactly what lazy
> evaluation is doing. Neither  filter ,  map  or  span  evaluate and
> store more list elements that strictly necessary.

I laways suspected as much, but was once told that Chris Okasaki has shown that 
any filter etc must allocate its own storage. With the peek/pull they don't 
have to, if they are nested, and the deepest one from the real storage gets 
pulled through some pointer chasing eventually. Span isn't so easily compiled 
out too or is it? But that might be a minor point.

For me, a real smart compiler is one that would take in e.g. (sum $ take n $ 
cycle $ [1..m]) and spill out a straight up math formula, inside a few ifs 
maybe (just an aside). 

Such a smart compiler might even be able to derive a well performing code right 
from the Turner's sieve. :)

> Sure, creating a list head only to immediately consume it is somewhat
> inefficient -- and the target of stream fusion[1] -- but this is an
> overhead of how list elements are stored, not how many.

it might be equivalent to the (imagined) producer's storing its 'current' value 
inside its frame.

How much can we rely on the run-time to actually destroy all the passed-over 
elements and not hang on to them for some time? Is this that compiler switch 
that Daniel mentioned? Is it reliable?

> 
> You can try to implement the Euler sieve with producers by using a type like
> 
>    data Producer a = forall s. Producer {
>                       state :: !s, next :: s -> s, value :: s -> a }
> 
> but I think this will be quite difficult; it's not clear what and thus
> how big the state will be. (See [1] for choosing a good type.)

I did that once in Scheme, as per SICP, with 'next' hanging in a stream's tail. 
Put filters and maps on top of that (inside the new 'next' actually). But that 
used the Scheme's lists as sorage. Another one was actual producers/modifiers 
with {pull,peek} interface. It even produced some primes, and some Hamming 
numbers. Then I saw Haskell, and thought I'd get all that for free with its 
equational reasoning.

But I get the impression that GHC isn't working through equational reasoning?.. 
I see all this talk about thunks etc.

> Concerning the sieves, there is a fundamental difference between the
> imperative sieve and the functional sieves, regardless of whether the
> latter start at p or p^2 or use a priority queue. Namely, the imperative
> sieve makes essential use of *pointer arithmetic*. The key point is that
> in order to cross off the multiples
> 
>     p, 2*p, 3*p, ...
> 
> of a prime, the algorithm can directly jump from the (k*p)-th to the
> (k*p+p)-th array element by adding  p  to the index. The functional
> versions can never beat that because they can't just jump over  p
> constructors of a data structure in O(1) time.

We can directy jump to the next multiple too, it is called (+). :) :)

But seriously, the real issue is that we have to merge the produced streams of 
multiples, while the mutable-storage code works on same one, so its "merging 
cost" is zero. And even if we are smart to merge them in a tree-like fashion, 
we still have no (or little) control over the compiler's representation of 
lists and retention of their content and whether it performs stream fusion or 
not (if we use lists).

If you could take a look at the tree-merging primes and why it uses too much 
memory, it would be great. The code is in Daniel's post to which I replied, or 
on haselwiki Prime_numbers page (there in its rudimentary form). It's a tangent 
to your VIP code, where instead of People structure an ordered list is just 
maintained as a split pair, of its known (so far, finite) prefix and the rest 
of it. Then under merging these split pairs form a monoid, s can be rearranged 
in a tree. If you have'nt seen it yet, it uses a different folding structure to 
your code, with a lower total cost of multiples production (estimated as Sum 
(1/p)*depth):

 tfold f (x:y:z:xs) = (x `f` (y `f` z)) `f` pairwise f xs
 comps = tfold $ pairwise mergeSP multips

But aside from the memory problem (about 50M vs Melissa's 2M), for the first 
few million primes produced it has almost exactly the same asymptotics as her 
PQ code and runs on par with it, compiled (and 2.5x faster when interpreted, in 
GHCi). It is also clear and concise. :) :) 

I think I'll have to try out your code (amended with a new folding structure)
and see if it has less memory problems maybe. I assume it is your code on the 
haskellwiki page. (?)

Cheers,

> 
> [1]: http://www.cse.unsw.edu.au/~dons/papers/CLS07.html
> 
> Regards,
> Heinrich Apfelmus
> 
> --
> http://apfelmus.nfshost.com
>