[Haskell-cafe] Re: FASTER primes

Sat Jan 9 11:23:19 EST 2010

Daniel Fischer <daniel.is.fischer <at> web.de> writes:

> 
> Am Samstag 09 Januar 2010 08:04:20 schrieb Will Ness:
> > Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > > Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness:
> > > > Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> > >
> > > It's not tail-recursive, the recursive call is inside a celebrate.
> >
> > It is (spMerge that is).
> 
> No.
> "In computer science, tail recursion (or tail-end recursion) is a special 
> case of recursion in which the last operation of the function, the tail 
> call, is a recursive call."

As far as I understand it, when a function makes a tail call to a tail 
recursive function (be it itself or some other function) it is itself tail 
recursive. I.e. that call may be replaced with a direct jump, with no new 
context to be created. That is what your version accomplishes, too. Mine really 
held to that pair ~(c,d) when it wanted to call (merge _ d) _after_ the call to 
spMerge. By passing the pre-decomposed part of a pair, 'd', into the process 
environment of spMerge, you've made it tail recursive - it carried along all 
the context it needed. That's what've eliminated the space leak, so I'd say 
tail recursion does play a role under lazy evaluation - when a compiler isn't 
smart enough to do _that_ for us by itself. _Were_ it reliably smart, even non-
recursive functions like my initial variant would work just as well. 

> The last operation of spMerge is a call to celebrate or the pair 
> constructor (be that P or (,)). Doesn't matter, though, as for lazy 
> languages, tail recursion isn't very important.
> 
> > It calls tail-recursive celebrate in a tail
> > position. What you've done, is to eliminate the outstanding context, by
> > moving it inward. Your detailed explanation is more clear than that. :)
> >
> > BTW when I run VIP code it is consistently slower than using just pairs,
> 
> I can't reproduce that. Ceteris paribus, I get the exact same allocation 
> and GC figures whether I use People or (,), running times identical enough 
> (difference between People and (,) is smaller than the difference between 
> runs of the same; the difference between the fastest and the slowest run of 
> the two is less than 0.5%). I think it must be the other changes you made.

I just take the VIP code as it is on a web page, and my intial variant without 
the wheel, and compare. Then I add the wheel in the same fashion, and then 
feeder, and compare again. When I tested that Monid instance code I too 
compared it to the straight pairs, and it was slower. Don't know why. 

> > modified with wheel and feeder and all. So what's needed is to
> > re-implement your approach for pairs:
> >
> >  mergeSP (a,b) ~(c,d) = let (bc,bd) = spMerge b c d
> >                            in (a ++ bc, bd)
> >      where
> >       spMerge b [] d = ([], merge b d)
> >       spMerge b@(x:xs) c@(y:ys) d = case compare x y of
> >                LT -> consSP x $ spMerge xs c  d
> >                EQ -> consSP x $ spMerge xs ys d
> >                GT -> consSP y $ spMerge b  ys d
> >
> >  consSP x ~(bc,bd) = (x:bc,bd) -- don't forget that magic `~` !!!
> 
> I called that (<:).
> 
> >
> > BTW I'm able to eliminate sharing without a compiler switch by using
> >
> 
> Yes, I can too. But it's easy to make a false step and trigger sharing.

yes indeed. It's seems unpredictable. Fortunately GHC couldn't tell that (12-1) 
was 11 by the looks of it. :) Your idea certainly seems right, that there ought 
to be some control over sharing on a per-function basis somehow without these 
ridiculous code tricks.

> I can get a nice speedup (~15%, mostly due to much less garbage collecting) 
by doing the final merge in a function without unnecessarily wrapping the 
result in a pair

Will have to wrap my head around _that_. But that would be fighting with the 
compiler again. I don't like that, I much rather fight with the problem at 
hand. There shouldn't be any pairs in the compiled code in the first place. 
They just guide the staging of (++) and (merge) intertwined between the 
producer streams. At each finite step, when the second part of a pair comes 
into play, it is only after its first part was completely consumed. I guess the 
next thing to try would be to actually create a data type MergeNode and arrange 
_those_ in a tree and see if that helps. That would be the next half-step 
towards the PQ itself.

> This uses a different folding structure again,

which I am yet to decipher. :)

> > How about them wheels? :)
> 
> Well, what about them?

I dunno, it makes for a real easy wheel derivation, and coming out of our 
discussion of euler's sieve it's a nice cross-pollination. :) Having yet 
another list representation suddenly cleared up the whole issue (two of them). 
I'll repost it one last time as I've made some corrections to it:

>   euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks)
>   primes = euler [2..]

>   primes  = euler $ listFrom [2] 1
>    = 2:euler ( listFrom [3] 1 `minus` map(2*) (listFrom [2] 1)) )
>                listFrom [3,4] 2 `minus` listFrom [4] 2
>                      == listFrom [3] 2 ==
>    = 2:3:euler (listFrom [5] 2 `minus` map(3*) (listFrom [3] 2))
>                 listFrom [5,7,9] 6 `minus` listFrom [9] 6
>                      == listFrom [5,7] 6 ==

  listFrom xs by = concat $ iterate (map (+ by)) xs

  rolls = unfoldr (Just . nextRoll) ([2],1)

  nextRoll r@(xs@(p:xt),b) = ( (p,r') , r')
     where
           r'  = (xs',p*b)
           xs' = (concat $ take p $ iterate (map (+ b)) $ xt ++ [p+b]) 
                 `minus` map (p*) xs

  nthWheel n = let (ps,rs) = unzip $ take n rolls
                   (x:xs,b) = last rs
               in ((ps, x), zipWith (-) (xs++[x+b]) (x:xs))

  eulerPrimes n = let (ps,rs) = unzip $ take n rolls
                      (qs@(q:_),b) = last rs
                  in ps ++ takeWhile (< q^2) qs

(BTW I've noticed that when I reply in Gmane, all the text below double hyphen, 
if present in the post to which I'm replying, just disappears. This may be an 
artefact of some specific browser.)