[Haskell-cafe] Re: FASTER primes
daniel.is.fischer at web.de
Sat Jan 9 08:59:01 EST 2010
Am Samstag 09 Januar 2010 08:04:20 schrieb Will Ness:
> Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness:
> > > Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > It's not tail-recursive, the recursive call is inside a celebrate.
> It is (spMerge that is).
"In computer science, tail recursion (or tail-end recursion) is a special
case of recursion in which the last operation of the function, the tail
call, is a recursive call."
The last operation of spMerge is a call to celebrate or the pair
constructor (be that P or (,)). Doesn't matter, though, as for lazy
languages, tail recursion isn't very important.
> It calls tail-recursive celebrate in a tail
> position. What you've done, is to eliminate the outstanding context, by
> moving it inward. Your detailed explanation is more clear than that. :)
> BTW when I run VIP code it is consistently slower than using just pairs,
I can't reproduce that. Ceteris paribus, I get the exact same allocation
and GC figures whether I use People or (,), running times identical enough
(difference between People and (,) is smaller than the difference between
runs of the same; the difference between the fastest and the slowest run of
the two is less than 0.5%). I think it must be the other changes you made.
> modified with wheel and feeder and all. So what's needed is to
> re-implement your approach for pairs:
> mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d
> in (a ++ bc, bd)
> spMerge u  d = (, merge u d)
> spMerge u@(x:xs) w@(y:ys) d = case compare x y of
> LT -> consSP x $ spMerge xs w d
> EQ -> consSP x $ spMerge xs ys d
> GT -> consSP y $ spMerge u ys d
> consSP x ~(a,b) = (x:a,b) -- don't forget that magic `~` !!!
I called that (<:).
> BTW I'm able to eliminate sharing without a compiler switch by using
Yes, I can too. But it's easy to make a false step and trigger sharing.
I can get a nice speedup (~15%, mostly due to much less garbage collecting)
by doing the final merge in a function without unnecessarily wrapping the
result in a pair (whose secondcomponent is ignored):
-- Doesn't need -fno-cse anymore,
-- but it needs -XScopedTypeVariables for the local type signatures
primes :: forall a. Integral a => () -> [a]
primes () = 2:3:5:7:11:13:calcPrimes 17 primes''
calcPrimes s cs = rollFrom s `minus` compos cs
bootstrap = 17:19:23:29:31:37:calcPrimes 41 bootstrap
primes' = calcPrimes 17 bootstrap
primes'' = calcPrimes 17 primes'
pmults :: a -> ([a],[a])
pmults p = case map (*p) (rollFrom p) of
(x:xs) -> ([x],xs)
multip :: [a] -> [([a],[a])]
multip ps = map pmults ps
compos :: [a] -> [a]
compos ps = case pairwise mergeSP (multip ps) of
((a,b):cs) -> a ++ funMerge b (pairwise mergeSP cs)
funMerge b (x:y:zs) = let (c,d) = mergeSP x y
in mfun b c d (pairwise mergeSP zs)
mfun u@(x:xs) w@(y:ys) d l = case compare x y of
LT -> x:mfun xs w d l
EQ -> x:mfun xs ys d l
GT -> y:mfun u ys d l
mfun u  d l = funMerge (merge u d) l
This uses a different folding structure again, which seems to give slightly
better performance than the original tree-fold structure. In contrast to
the VippyPrimes, it profits much from a larger allocation area, running
with +RTS -A2M gives a >10% speedup for prime # 10M/20M, +RTS -A8M nearly
20%. -A16M and -A32M buy a little more, but in that range at least, it's
not much (may be significant for larger targets).
Still way slower than PQ, but the gap is narrowing.
> mtwprimes () = 2:3:5:7:primes
> primes = doPrimes 121 primes
> doPrimes n prs = let (h,t) = span (< n) $ rollFrom 11
> in h ++ t `diff` comps prs
> doPrimes2 n prs = let (h,t) = span (< n) $ rollFrom (12-1)
> in h ++ t `diff` comps prs
> mtw2primes () = 2:3:5:7:primes
> primes = doPrimes 26 primes2
> primes2 = doPrimes2 121 primes2
> Using 'splitAt 26' in place of 'span (< 121)' didn't work though.
> How about them wheels? :)
Well, what about them?
More information about the Haskell-Cafe