[Haskell-cafe] Re: FASTER primes
Will Ness
will_n48 at yahoo.com
Mon Jan 4 10:30:18 EST 2010
Heinrich Apfelmus <apfelmus <at> quantentunnel.de> writes:
>
> Will Ness wrote:
> >
> > I keep thinking that storage duplication with span, filter etc. is not
> > really
> > necessary, and can be replaced with some pointer chasing - especially when
> > there's only one consumer present for the generated values.
> >
> > What I mean is thinking of lists in terms of produce/consumer paradigm, as
> > objects supporting the { pull, peek } interface, keeping the generator
> > inside that would produce the next value on 'pull' request and keep it
> > ready for any 'peek's.
> >
> > Euler's sieve is
> >
> > sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..])
> > where (h,t) = span (< p*p) xs
> >
> > [...]
> >
> > The real difference here is between those producers whose values will
> > only be consumed once, by one specific consumer, and those which values
> > may be needed more than once, so need really to be maintained in some
> > storage. If not - span, filter, map, whatever - they all are just little
> > modifiers on top of the real producers, which may or may not also have
> > an actual storage maintained by them.
>
> (I haven't followed the whole thread, but hopefully I have enough grasp
> of it to make a useful remark. :))
>
> Concerning lists as producer/consumer, I think that's exactly what lazy
> evaluation is doing. Neither filter , map or span evaluate and
> store more list elements that strictly necessary.
I laways suspected as much, but was once told that Chris Okasaki has shown that
any filter etc must allocate its own storage. With the peek/pull they don't
have to, if they are nested, and the deepest one from the real storage gets
pulled through some pointer chasing eventually. Span isn't so easily compiled
out too or is it? But that might be a minor point.
For me, a real smart compiler is one that would take in e.g. (sum $ take n $
cycle $ [1..m]) and spill out a straight up math formula, inside a few ifs
maybe (just an aside).
Such a smart compiler might even be able to derive a well performing code right
from the Turner's sieve. :)
> Sure, creating a list head only to immediately consume it is somewhat
> inefficient -- and the target of stream fusion[1] -- but this is an
> overhead of how list elements are stored, not how many.
it might be equivalent to the (imagined) producer's storing its 'current' value
inside its frame.
How much can we rely on the run-time to actually destroy all the passed-over
elements and not hang on to them for some time? Is this that compiler switch
that Daniel mentioned? Is it reliable?
>
> You can try to implement the Euler sieve with producers by using a type like
>
> data Producer a = forall s. Producer {
> state :: !s, next :: s -> s, value :: s -> a }
>
> but I think this will be quite difficult; it's not clear what and thus
> how big the state will be. (See [1] for choosing a good type.)
I did that once in Scheme, as per SICP, with 'next' hanging in a stream's tail.
Put filters and maps on top of that (inside the new 'next' actually). But that
used the Scheme's lists as sorage. Another one was actual producers/modifiers
with {pull,peek} interface. It even produced some primes, and some Hamming
numbers. Then I saw Haskell, and thought I'd get all that for free with its
equational reasoning.
But I get the impression that GHC isn't working through equational reasoning?..
I see all this talk about thunks etc.
> Concerning the sieves, there is a fundamental difference between the
> imperative sieve and the functional sieves, regardless of whether the
> latter start at p or p^2 or use a priority queue. Namely, the imperative
> sieve makes essential use of *pointer arithmetic*. The key point is that
> in order to cross off the multiples
>
> p, 2*p, 3*p, ...
>
> of a prime, the algorithm can directly jump from the (k*p)-th to the
> (k*p+p)-th array element by adding p to the index. The functional
> versions can never beat that because they can't just jump over p
> constructors of a data structure in O(1) time.
We can directy jump to the next multiple too, it is called (+). :) :)
But seriously, the real issue is that we have to merge the produced streams of
multiples, while the mutable-storage code works on same one, so its "merging
cost" is zero. And even if we are smart to merge them in a tree-like fashion,
we still have no (or little) control over the compiler's representation of
lists and retention of their content and whether it performs stream fusion or
not (if we use lists).
If you could take a look at the tree-merging primes and why it uses too much
memory, it would be great. The code is in Daniel's post to which I replied, or
on haselwiki Prime_numbers page (there in its rudimentary form). It's a tangent
to your VIP code, where instead of People structure an ordered list is just
maintained as a split pair, of its known (so far, finite) prefix and the rest
of it. Then under merging these split pairs form a monoid, s can be rearranged
in a tree. If you have'nt seen it yet, it uses a different folding structure to
your code, with a lower total cost of multiples production (estimated as Sum
(1/p)*depth):
tfold f (x:y:z:xs) = (x `f` (y `f` z)) `f` pairwise f xs
comps = tfold $ pairwise mergeSP multips
But aside from the memory problem (about 50M vs Melissa's 2M), for the first
few million primes produced it has almost exactly the same asymptotics as her
PQ code and runs on par with it, compiled (and 2.5x faster when interpreted, in
GHCi). It is also clear and concise. :) :)
I think I'll have to try out your code (amended with a new folding structure)
and see if it has less memory problems maybe. I assume it is your code on the
haskellwiki page. (?)
Cheers,
>
> [1]: http://www.cse.unsw.edu.au/~dons/papers/CLS07.html
>
> Regards,
> Heinrich Apfelmus
>
> --
> http://apfelmus.nfshost.com
>
More information about the Haskell-Cafe
mailing list