[Haskell-cafe] Re: FASTER primes
daniel.is.fischer at web.de
Fri Jan 8 22:52:48 EST 2010
Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness:
> Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > mergeSP :: Integral a => People a -> People a -> People a
> > mergeSP p1@(P a _) p2 = P (a ++ vips mrgd) (dorks mrgd)
> > where
> > mrgd = spMerge (dorks p1) (vips p2) (dorks p2)
> > spMerge l1  l3 = P  (merge l1 l3)
> > spMerge ~l1@(x:xs) l2@(y:ys) l3 = case compare x y of
> > LT -> celebrate x (spMerge xs l2 l3)
> > EQ -> celebrate x (spMerge xs ys l3)
> > GT -> celebrate y (spMerge l1 ys l3)
> > ----------------------------------------------------------------------
> Hi Daniel,
> Is it so that you went back to my fold structure?
> Was it better for really big numbers of primes?
Yes, it is slower for <= 20 million primes (and some beyond), but faster
for >= 50 million (and some before), according to the few tests I made.
> I had the following for ages (well, at least two weeks) but I thought it
> was slower and took more memory (that was _before_ the 'no-share' and
> 'feeder' stuff). I can see the only difference in that you've re-written
> spMerge in a tail-recursive style with explicitly deconstructed parts;
It's not tail-recursive, the recursive call is inside a celebrate.
As it turns out, the important things are
1. a feeder and separate lists of multiples for the feeder and the runner,
for the reasons detailed earlier (two-step feeding and larger wheel are
pleasing but minor optimisations).
2. a strict pattern in tfold
3. moving the merge inside spMerge (you can have
mergeSP (a,b) ~(c,d) = (a ++ bc, m)
(bc,m) = spMerge b c
spMerge u  = (,merge u d)
without exploding memory, but it's *much* slower than letting spMerge take
Now, I have to admit, the only point I _really_ understand is 1. (and why
the three-argument spMerge is faster than the two-argument one taking the
merge-partner from mergeSP's binding :).
mergeSP (a,b) ~(c,d)
= let (bc,b') = spMerge b c in (a ++ bc, merge b' d)
a memory leak, but
mergeSP (a,b) ~(c,d)
= let (bc,m) = spMerge' b c d in (a ++ bc, m)
Well, looking at the core for mergeSP, the fog clears somewhat. The former
is translated roughly to
mergeSP (a,b) pair
= let sm = spMerge b (fst pair)
in (a ++ fst sm, merge (snd sm) (snd pair))
It holds on to the pair until the result of the merge is demanded, that is
until the entire (a ++ fst sm) is consumed. Only then is the pair released
and can be collected. On top of that, as soon as a is consumed and (fst sm)
[or bc] is demanded, spMerge forces the evaluation of (fst pair) [c]. After
a few levels, the evaluated list will take more space than the thunk. It
cannot yet be collected, because pair is still alive. The elements have to
be duplicated for (fst sm), because they're intertwined with those of b.
On the next level of merging, they have to be duplicated again.
The latter is translated roughly to
mergeSP (a,b) pair
= let sm = spMerge' b (fst pair) (snd pair)
in (a ++ fst sm, snd sm)
The pair is released as soon as the result of the spMerge' is demanded,
that is, when a is consumed. Then the elements of (fst pair) need not be
duplicated and they can be discarded almost immediately after they have
been produced [for small primes, multiples of larger primes live a little
longer, but those are fewer].
So, no duplication, shorter lifespan => no leak.
Having seen that, the question is, why can't the compiler see that
deconstructing the pair when the first component is needed is better? The
first component of the pair is used in no other place, so keeping the pair
can't have any advantage, can it?
And why does
tfold f (a: ~(b: ~(c:xs))) = ...
leak, but not
tfold f (a:b:c:xs) = ...
I guess it's similar.
tfold f (a: ~(b: ~(c:xs)))
= (a `f` (b `f` c)) `f` tfold f ([pairwise f] xs)
tfold f (a:zs)
= (a `f` (head zs `f` (head $ tail zs)))
`f` tfold f (pairwise f $ drop 2 zs)
the latter part holds on to the beginning of zs, leading again to data
duplication and too long lifespans.
> mine was relying on the compiler (8-L) to de-couple the two pipes and
> recognize that the second just passes along the final result, unchanged.
> The two versions seem to me to be _exactly_ operationally equivalent.
Well, they're not. The main difference is what we told the compiler _when_
to deconstruct the pairs. You said "at the latest possible moment", I said
"as soon as we need the first component".
It's not entirely obvious, but it is frequently said that understanding the
space (and time) behaviour of lazy evaluation isn't always easy.
> All this searching for the code better understood by the compiler is
> _*very*_ frustrating, as it doesn't reflect on the semantics of the
> code, or even the operational semantics of the code. :-[
> Weren't the P's supposed to disappear completely in the compiled code?
No. Constructors for data types can't disappear completely, only newtype
constructors can (and do).
> Aren't types just a _behavioral_ definitions??? Aren't we supposed to be
> able to substitute equals for equals dammit??
We can. But substituting equals for equals can alter space and time
fromInteger 0 ==
fromInteger (let xs = [1 .. 10^8] in (product xs + sum xs) `mod` 10)
> Is this the state of our _best_ Haskell compiler????
Yes. It's still a "do what I tell you to" compiler, even if a pretty slick
one, not a "do what I mean" compiler. Sometimes, what you tell the compiler
isn't what you wanted.
It's easier to predict when you give detailed step by step instructions.
More information about the Haskell-Cafe