[Haskell-cafe] Re: FASTER primes

Sat Jan 9 08:59:01 EST 2010

Am Samstag 09 Januar 2010 08:04:20 schrieb Will Ness:
> Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness:
> > > Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> >
> > It's not tail-recursive, the recursive call is inside a celebrate.
>
> It is (spMerge that is).

No.
"In computer science, tail recursion (or tail-end recursion) is a special 
case of recursion in which the last operation of the function, the tail 
call, is a recursive call."

The last operation of spMerge is a call to celebrate or the pair 
constructor (be that P or (,)). Doesn't matter, though, as for lazy 
languages, tail recursion isn't very important.

> It calls tail-recursive celebrate in a tail
> position. What you've done, is to eliminate the outstanding context, by
> moving it inward. Your detailed explanation is more clear than that. :)
>
> BTW when I run VIP code it is consistently slower than using just pairs,

I can't reproduce that. Ceteris paribus, I get the exact same allocation 
and GC figures whether I use People or (,), running times identical enough 
(difference between People and (,) is smaller than the difference between 
runs of the same; the difference between the fastest and the slowest run of 
the two is less than 0.5%). I think it must be the other changes you made.

> modified with wheel and feeder and all. So what's needed is to
> re-implement your approach for pairs:
>
>  mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d
>                            in (a ++ bc, bd)
>      where
>       spMerge u [] d = ([], merge u d)
>       spMerge u@(x:xs) w@(y:ys) d = case compare x y of
>                LT -> consSP x $ spMerge xs w  d
>                EQ -> consSP x $ spMerge xs ys d
>                GT -> consSP y $ spMerge u  ys d
>
>  consSP x ~(a,b) = (x:a,b)   -- don't forget that magic `~` !!!

I called that (<:).

>
>
> BTW I'm able to eliminate sharing without a compiler switch by using
>

Yes, I can too. But it's easy to make a false step and trigger sharing.

I can get a nice speedup (~15%, mostly due to much less garbage collecting) 
by doing the final merge in a function without unnecessarily wrapping the 
result in a pair (whose secondcomponent is ignored):

-- Doesn't need -fno-cse anymore,
-- but it needs -XScopedTypeVariables for the local type signatures

primes :: forall a. Integral a => () -> [a]
primes () = 2:3:5:7:11:13:calcPrimes 17 primes''
   where
    calcPrimes s cs = rollFrom s `minus` compos cs
    bootstrap   = 17:19:23:29:31:37:calcPrimes 41 bootstrap
    primes'     = calcPrimes 17 bootstrap
    primes''    = calcPrimes 17 primes'

    pmults :: a -> ([a],[a])
    pmults p = case map (*p) (rollFrom p) of
                (x:xs) -> ([x],xs)

    multip :: [a] -> [([a],[a])]
    multip ps = map pmults ps

    compos :: [a] -> [a]
    compos ps = case pairwise mergeSP (multip ps) of
                    ((a,b):cs) -> a ++ funMerge b (pairwise mergeSP cs)

    funMerge b (x:y:zs) = let (c,d) = mergeSP x y
                            in mfun b c d (pairwise mergeSP zs)

    mfun u@(x:xs) w@(y:ys) d  l = case compare x y of
                LT -> x:mfun xs w d l
                EQ -> x:mfun xs ys d l
                GT -> y:mfun u ys d l
    mfun u [] d l = funMerge (merge u d) l

This uses a different folding structure again, which seems to give slightly 
better performance than the original tree-fold structure. In contrast to 
the VippyPrimes, it profits much from a larger allocation area, running 
with +RTS -A2M gives a >10% speedup for prime # 10M/20M, +RTS -A8M nearly 
20%. -A16M and -A32M buy a little more, but in that range at least, it's 
not much (may be significant for larger targets).
Still way slower than PQ, but the gap is narrowing.

>
>  mtwprimes () = 2:3:5:7:primes
>    where
>     primes = doPrimes 121 primes
>
>  doPrimes n prs = let (h,t) = span (< n) $ rollFrom 11
>                   in h ++ t `diff` comps prs
>  doPrimes2 n prs = let (h,t) = span (< n) $ rollFrom (12-1)
>                    in h ++ t `diff` comps prs
>
>  mtw2primes () = 2:3:5:7:primes
>    where
>     primes  = doPrimes 26 primes2
>     primes2 = doPrimes2 121 primes2
>
>
> Using 'splitAt 26' in place of 'span (< 121)' didn't work though.
>
>
> How about them wheels? :)
>

Well, what about them?