[Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

Mon Jul 23 16:54:08 EDT 2007

As an exercise, trying to understand the beautiful paper

	Stream Fusion
	From Lists to Streams to Nothing at All
	Duncan Coutts, Roman Leshchinskiy and Don Stewart
	http://www.cse.unsw.edu.au/~dons/papers/CLS07.html
	http://www.cse.unsw.edu.au/~dons/streams.html

I recoded my prime sieve using a pared down version of their Stream  
datatype; this is the simplest version I could write that  achieves a  
significant speedup.

My reaction to their paper was, if streams are better internally than  
lists, why not code directly in streams? Lists enjoy a serious  
notational advantage in Haskell, but one could imagine a language  
where the list notation was reserved for stream semantics.

My sieve was spending half its time in "merge", so I made only the  
changes necessary to convert "merge" to use streams. My streams are  
infinite, and "merge" can be written to not use Skip, so Step goes away.

Even though "nextx" and "nexty" only have one case now, using case  
statements is significantly faster than using let or where clauses.  
I'm imagining that I read about this somewhere, but if I did, it  
didn't sink in until I was tuning this code. I don't know if this is  
related to fusion optimization, or a general effect.

The timings are

> [Integer] -O2  1*10^6   2*10^6   3*10^6   4*10^6   5*10^6
> ---------------------------------------------------------
> ONeillPrimes |  3.338 |  7.320 | 11.911 | 18.225 | 21.785
> StreamPrimes |  3.867 |  8.405 | 13.656 | 21.542 | 37.640
> BayerPrimes  |  3.960 |  8.940 | 18.528 | 33.221 | 38.568

Here is the code:

> {-# OPTIONS_GHC -fglasgow-exts #-}
>
> module StreamPrimes (primes) where
>
> -- stream code
>
> data Stream a = forall s. Stream (s -> (a,s)) s
> data AStream a = A a (AStream a) | B (Stream a)
>
> stream :: [a] -> Stream a
> stream xs = Stream next xs
>   where
>     next []     = undefined
>     next (x:xt) = (x,xt)
>
> astream :: [a] -> AStream a
> astream []     = undefined
> astream (x:xt) = A x $ B $ stream xt
>
> merge :: Ord a => Stream a -> Stream a -> Stream a
> merge (Stream nextx vs) (Stream nexty ws) =
>     Stream next (vt,ws,Left v)
>     where
>         (v,vt) = nextx vs
>         next (xs,ys,Left x) =
>             case nexty ys of
>                 (y,yt) ->
>                     if   x < y
>                     then (x,(xs,yt,Right y))
>                     else (y,(xs,yt,Left x))
>         next (xs,ys,Right y) =
>             case nextx xs of
>                 (x,xt) ->
>                     if   x < y
>                     then (x,(xt,ys,Right y))
>                     else (y,(xt,ys,Left x))
>
> mergeA :: Ord a => AStream a -> AStream a -> AStream a
> mergeA (A x xt) ys = A x (mergeA xt ys)
> mergeA (B xs)   ys = mergeB xs ys
>
> mergeB :: Ord a => Stream a -> AStream a -> AStream a
> mergeB s@(Stream next xs) ys@(A y yt) =
>     case next xs of
>         (x,xt) ->
>             if x < y
>             then A x (mergeB (Stream next xt) ys)
>             else A y (mergeB s yt)
> mergeB xs (B ys) = B $ merge xs ys
>
> -- Code for venturi :: Ord a => [[a]] -> [a]
>
> root :: Ord a => AStream a -> [AStream a] -> [a]
> root (A x xt) yss    = x : (root xt yss)
> root (B xs) (ys:yst) = root (mergeB xs ys) yst
> root _ _ = undefined
>
> pair :: Ord a => [AStream a] -> [AStream a]
> pair (x:y:xt) = mergeA x y : (pair xt)
> pair _ = undefined
>
> group :: Ord a => [AStream a] -> [AStream a]
> group (x:xt) = x : (group $ pair xt)
> group _ = undefined
>
> venturi :: Ord a => [[a]] -> [a]
> venturi (x:xt) = root (astream x) $ group $ map astream xt
> venturi _ = undefined
>
> -- Code for primes :: Integral a => [a]
>
> diff  :: Ord a => [a] -> [a] -> [a]
> diff xs@(x:xt) ys@(y:yt) = case compare x y of
>     LT -> x : (diff  xt ys)
>     EQ ->     (diff  xt yt)
>     GT ->     (diff  xs yt)
> diff _ _ = undefined
>
> trim :: Integral a => a -> [a] -> [a]
> trim p = let f m x = mod x m /= 0 in filter (f p)
>
> seed :: Integral a => [a]
> seed = [2,3,5,7,11,13,17]
>
> wheel :: Integral a => [a]
> wheel = drop 1 [ m*j + k | j <- [0..], k <- ws ]
>     where m  = foldr1 (*) seed
>           ws = foldr trim [1..m] seed
>
> multiples :: Integral a => [a] -> [[a]]
> multiples ws = map fst $ tail $ iterate g ([], ws)
>     where g (_,ps@(p:pt)) = ([ m*p | m <- ps ], trim p pt)
>           g _ = undefined
>
> primes :: Integral a => [a]
> primes = seed ++ (diff wheel $ venturi $ multiples wheel)