Daniel Fischer daniel.is.fischer at web.de
Sat Jan 9 08:59:01 EST 2010

```Am Samstag 09 Januar 2010 08:04:20 schrieb Will Ness:
> Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness:
> > > Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> >
> > It's not tail-recursive, the recursive call is inside a celebrate.
>
> It is (spMerge that is).

No.
"In computer science, tail recursion (or tail-end recursion) is a special
case of recursion in which the last operation of the function, the tail
call, is a recursive call."

The last operation of spMerge is a call to celebrate or the pair
constructor (be that P or (,)). Doesn't matter, though, as for lazy
languages, tail recursion isn't very important.

> It calls tail-recursive celebrate in a tail
> position. What you've done, is to eliminate the outstanding context, by
> moving it inward. Your detailed explanation is more clear than that. :)
>
> BTW when I run VIP code it is consistently slower than using just pairs,

I can't reproduce that. Ceteris paribus, I get the exact same allocation
and GC figures whether I use People or (,), running times identical enough
(difference between People and (,) is smaller than the difference between
runs of the same; the difference between the fastest and the slowest run of
the two is less than 0.5%). I think it must be the other changes you made.

> modified with wheel and feeder and all. So what's needed is to
> re-implement your approach for pairs:
>
>  mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d
>                            in (a ++ bc, bd)
>      where
>       spMerge u [] d = ([], merge u d)
>       spMerge u@(x:xs) w@(y:ys) d = case compare x y of
>                LT -> consSP x \$ spMerge xs w  d
>                EQ -> consSP x \$ spMerge xs ys d
>                GT -> consSP y \$ spMerge u  ys d
>
>  consSP x ~(a,b) = (x:a,b)   -- don't forget that magic `~` !!!

I called that (<:).

>
>
> BTW I'm able to eliminate sharing without a compiler switch by using
>

Yes, I can too. But it's easy to make a false step and trigger sharing.

I can get a nice speedup (~15%, mostly due to much less garbage collecting)
by doing the final merge in a function without unnecessarily wrapping the
result in a pair (whose secondcomponent is ignored):

-- Doesn't need -fno-cse anymore,
-- but it needs -XScopedTypeVariables for the local type signatures

primes :: forall a. Integral a => () -> [a]
primes () = 2:3:5:7:11:13:calcPrimes 17 primes''
where
calcPrimes s cs = rollFrom s `minus` compos cs
bootstrap   = 17:19:23:29:31:37:calcPrimes 41 bootstrap
primes'     = calcPrimes 17 bootstrap
primes''    = calcPrimes 17 primes'

pmults :: a -> ([a],[a])
pmults p = case map (*p) (rollFrom p) of
(x:xs) -> ([x],xs)

multip :: [a] -> [([a],[a])]
multip ps = map pmults ps

compos :: [a] -> [a]
compos ps = case pairwise mergeSP (multip ps) of
((a,b):cs) -> a ++ funMerge b (pairwise mergeSP cs)

funMerge b (x:y:zs) = let (c,d) = mergeSP x y
in mfun b c d (pairwise mergeSP zs)

mfun u@(x:xs) w@(y:ys) d  l = case compare x y of
LT -> x:mfun xs w d l
EQ -> x:mfun xs ys d l
GT -> y:mfun u ys d l
mfun u [] d l = funMerge (merge u d) l

This uses a different folding structure again, which seems to give slightly
better performance than the original tree-fold structure. In contrast to
the VippyPrimes, it profits much from a larger allocation area, running
with +RTS -A2M gives a >10% speedup for prime # 10M/20M, +RTS -A8M nearly
20%. -A16M and -A32M buy a little more, but in that range at least, it's
not much (may be significant for larger targets).
Still way slower than PQ, but the gap is narrowing.

>
>  mtwprimes () = 2:3:5:7:primes
>    where
>     primes = doPrimes 121 primes
>
>  doPrimes n prs = let (h,t) = span (< n) \$ rollFrom 11
>                   in h ++ t `diff` comps prs
>  doPrimes2 n prs = let (h,t) = span (< n) \$ rollFrom (12-1)
>                    in h ++ t `diff` comps prs
>
>  mtw2primes () = 2:3:5:7:primes
>    where
>     primes  = doPrimes 26 primes2
>     primes2 = doPrimes2 121 primes2
>
>
> Using 'splitAt 26' in place of 'span (< 121)' didn't work though.
>
>
> How about them wheels? :)
>