[Haskell-cafe] Re: FASTER primes

Thu Jan 7 11:13:38 EST 2010

Am Donnerstag 07 Januar 2010 11:43:44 schrieb Heinrich Apfelmus:
> Will Ness wrote:
> > Heinrich Apfelmus writes:
> >> Concerning lists as producer/consumer, I think that's exactly what lazy
> >> evaluation is doing. Neither  filter ,  map  or  span  evaluate and
> >> store more list elements that strictly necessary.
> >
> > I laways suspected as much, but was once told that Chris Okasaki has
> > shown that any filter etc must allocate its own storage. With the
> > peek/pull they don't have to, if they are nested, and the deepest one
> > from the real storage gets pulled through some pointer chasing
> > eventually. Span isn't so easily compiled out too or is it? But that
> > might be a minor point.
>
> Hm? In my world view, there is only reduction to normal form and I don't
> see how "allocate its own storage" fits in there. Okasaki having shown
> something to that end would be new to me?
>
Perhaps what was meant was "storage must be allocated for each filter" (which, however, 
seems trivial).

> >
> > We can directy jump to the next multiple too, it is called (+). :) :)
> >
> > But seriously, the real issue is that we have to merge the produced
> > streams of multiples, while the mutable-storage code works on same one,
> > so its "merging cost" is zero. And even if we are smart to merge them in
> > a tree-like fashion, we still have no (or little) control over the
> > compiler's representation of lists and retention of their content and
> > whether it performs stream fusion or not (if we use lists).
>
> Not quite, I think there are two things going on:
>
> 1. In contrast to the standard imperative version, we are implementing
> an on-line algorithm that produces arbitrarily many primes on demand.
> Any imperative on-line version will need to think about storage for
> pending prime filters as well.

If you consider a segmented array sieve an on-line algorithm, at the expense of speed, you 
can omit storage for prime filters. It will not be a real Eratosthenes sieve anymore, but 
it won't be too horribly slow, I think. If you're using an Atkin sieve, the performance 
penalty should be even lower.

>
> 2. Even the imperative algorithm can be interpreted as "merging" arrays,
> just that the composites are magically merged at the right place (at

And they're merged asynchronously. Yet more magic :)

> distance p from each other) because arrays offer O(1) "jumps". In
> contrast, the functional version needs O(log something) time to
> calculate where to put the next composite.
>
> > If you could take a look at the tree-merging primes and why it uses too
> > much memory, it would be great.
>
> Fortunately, Daniel already took a detailed look. :) But I also have a
> few remarks.
>
> > The code is in Daniel's post to which I replied, or
> > on haskellwiki Prime_numbers page (there in its rudimentary form). It's a
> > tangent to your VIP code, where instead of People structure an ordered
> > list is just maintained as a split pair, of its known (so far, finite)
> > prefix and the rest of it.
>
> Aww, why remove the cutesy name? The VIPs will be angry for being ignored!

You'll be happy to learn that they return, albeit in a different form, then :)

>
> Jest aside, note that putting the crowd at the end of the list where it
> belongs has a reason: you have a guarantee that crowds won't take any
> memory until they actually appear after all the VIPs. If you put both
> VIPs and the crowd into a pair, it's easier to get space leaks.
>
Indeed. It's the combination of pairs and the bushy merge trees that lets the memory 
explode. I still don't completely understand the behaviour, but I managed to squash the 
space leak :D (see below). Thanks for the idea.

> In particular, I think that your  mergeSP  has a laziness bug, it should be
>
>     mergeSP ~(a,b) ~(c,d) = ...
>

That doesn't help. The pattern match on the first argument doesn't hurt.

> This is a cure for the (potential) problem that  spMerge  first performs
> a pattern match / comparison and then returns a pair constructor instead
> of the other way round. For instance, your second case
>
>     spMerge u [] = ([], u)
>
> should actually behave like
>
>     spMerge u v  = (a,b)
>         where
>         a = if null v then [] else ...
>         b = if null v then u else ...
>
> (I don't recommend rewriting  spMerge  in this fashion, the lazy pattern
> in  mergeSP  will do the trick.)

It didn't. Probably rewriting spMerge thus would do, but I think I'm too lazy to find out.

>
> Now, I'm not sure whether this is actually a problem or not. But the
> version shown here, with two lazy patterns instead of just one, is much
> closer to how the original  People  structure behaves.
>
> > Then under merging these split pairs form a monoid, s can be rearranged
> > in a tree. If you haven't seen it yet, it uses a different folding
> > structure to your code, with a lower total cost of multiples production
> > (estimated as Sum (1/p)*depth):
> >
> >  tfold f (x:y:z:xs) = (x `f` (y `f` z)) `f` pairwise f xs
> >  comps = tfold $ pairwise mergeSP multips
>
> The idea being that each prime produces  1/p  composites each "turn" and
> it takes time  depth  to trickle it to the top? Sounds reasonable, but
> remember that a prime  p  does not start to generate composites until
> the "turn count" has reached p^2, so it might occupy a place of low
> depth in the tree that is better spent on the previous primes. But your
> experiments seem to tell that it works anyway.
>
>
> By the way, I think that both  tfold  and  pairwise  need to be lazier.
>
>     tfold f ~(x: ~(y: ~(z:zs))) = (x `f` (y `f` z)) `f` pairwise f zs
>
>     pairwise f ~(x: ~(y:ys))    = f x y : pairwise f ys
>
> Otherwise, large parts of the tree might be generated too early and hog
> memory.
>
As it turns out, I don't know why, it's the other way round. Each lazy pattern increases 
memory usage. Not very intuitive.
>
> Also, a small remark on style: I don't like the repetition of  mergeSP  in
>
>     compos = fst $ tfold mergeSP (pairwise mergeSP (multip ps))

Noted and acted upon.

>
> After all, the right hand side is just another tree fold and I would
> clearly mark it as such:
>
>     compos = fst . superSmartTreeFold mergeSP . multip $ ps
>
>     superSmartTreeFold f = tfold f . pairwise f
>
> ;)
>
> > I think I'll have to try out your code (amended with a new folding
> > structure) and see if it has less memory problems maybe. > I assume it is
> > your code on the haskellwiki page. (?)
>
> Yes, I put the code snippet on the wiki. And never tested it on large
> numbers. ;)
>
>
> Regards,
> Heinrich Apfelmus

The below code is now a well-behaved memory citizen (3MB for the 100 millionth prime, 
about the same as the PQ code). It still is considerably slower than the PQ code.
In terms of MUT times as reported by +RTS -sstderr - as well as (MUT+GC) times - (measured 
once for prime No. 5*10^5, 10^6, 2*10^6, 4*10^6, 10^7 to get a rough tendency), it seems 
to scale a wee bit better than any of the other tfold versions I created, but a little 
worse than the PQ versions.
The relation of MUT times isn't too bad, but the GC times are rather abysmal (30-40%).

----------------------------------------------------------------------
{-# LANGUAGE ScopedTypeVariables #-}
{-# OPTIONS_GHC -O2 -fno-cse #-}
module VippyPrimes (primes) where

data People a = P { vips :: [a], dorks :: [a] }

{-# SPECIALISE celebrate :: Int -> People Int -> People Int #-}
{-# SPECIALISE celebrate :: Integer -> People Integer -> People Integer #-}
celebrate :: a -> People a -> People a
celebrate x p = P (x:vips p) (dorks p)

{-# SPECIALISE primes :: () -> [Int] #-}
{-# SPECIALISE primes :: () -> [Integer] #-}
primes :: forall a. Integral a => () -> [a]
primes () = 2:3:5:7:11:13:primes'
   where
    primes'   = roll 17 wheel13 `minus` compos primes'''
    primes''  = 17:19:23:29:31:37:rollFrom 41 `minus` compos primes''
    primes''' = 17:19:23:29:31:37:rollFrom 41 `minus` compos primes''

    pmults :: a -> People a
    pmults p = case map (*p) (rollFrom p) of
                (x:xs) -> P [x] xs

    multip :: [a] -> [People a]
    multip ps = map pmults ps

    compos :: [a] -> [a]
    compos = vips . itfold mergeSP . multip

rollFrom n = go ((n-17) `rem` 30030) wheel13
      where
        go r xxs@(x:xs)
            | r < x     = roll n xxs
            | otherwise = go (r-x) xs

smartfold f = tfold f . pairwise f

tfold f (a:b:c:xs) = (a `f` (b `f` c)) `f` smartfold f xs

pairwise f (x:y:ys)  = f x y : pairwise f ys

{-# SPECIALISE mergeSP :: People Int -> People Int -> People Int #-}
{-# SPECIALISE mergeSP :: People Integer -> People Integer -> People Integer #-}
mergeSP :: Integral a => People a -> People a -> People a
mergeSP p1@(P a _) p2 = P (a ++ vips mrgd) (dorks mrgd)
      where
        mrgd = spMerge (dorks p1) (vips p2) (dorks p2)
        spMerge l1 [] l3 = P [] (merge l1 l3)
        spMerge ~l1@(x:xs) l2@(y:ys) l3 = case compare x y of
                LT -> celebrate x (spMerge xs l2 l3)
                EQ -> celebrate x (spMerge xs ys l3)
                GT -> celebrate y (spMerge l1 ys l3)

-- do the sepcialisations below make a difference at all?
{-# SPECIALISE roll :: Int -> [Int] -> [Int] #-}
{-# SPECIALISE roll :: Integer -> [Integer] -> [Integer] #-}
roll :: Integral a => a -> [a] -> [a]
roll       = scanl (+)

{-# SPECIALISE wheel :: [Int] #-}
{-# SPECIALISE wheel :: [Integer] #-}
wheel :: Integral a => [a]
wheel      = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
           4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel

{-# SPECIALISE wheel11 :: [Int] #-}
{-# SPECIALISE wheel11 :: [Integer] #-}
wheel11 :: Integral a => [a]
wheel11 = res
      where
        snms = scanl (+) 11 (take 47 wheel)
        nums = tail $ scanl (+) 11 (take 529 wheel)
        cops = nums `minus` map (*11) snms
        diffs = zipWith (-) (tail cops) cops
        res = foldr (:) res diffs

{-# SPECIALISE wheel13 :: [Int] #-}
{-# SPECIALISE wheel13 :: [Integer] #-}
wheel13 :: Integral a => [a]
wheel13 = res
      where
        snms = take 480 $ scanl (+) 13 wheel11
        nums = take (480*13+1) . tail $ scanl (+) 13 wheel11
        cops = nums `minus` map (*13) snms
        diffs = zipWith (-) (tail cops) cops
        res = foldr (:) res diffs

{-# SPECIALISE minus :: [Int] -> [Int] -> [Int] #-}
{-# SPECIALISE minus :: [Integer] -> [Integer] -> [Integer] #-}
minus :: Integral a => [a] -> [a] -> [a]
minus a@(x:xs) b@(y:ys) = case compare x y of
       LT -> x: xs `minus` b
       GT ->    a  `minus` ys
       EQ ->    xs `minus` ys
minus a b  = a

merge a@(x:xs) b@(y:ys) = case compare x y of
       LT -> x: merge xs b
       EQ -> x: merge xs ys
       GT -> y: merge a  ys
merge a b  = if null b then a else b
----------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100107/f92cd9be/attachment-0001.html