[Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was:
Genuine Eratosthenes sieve)
Dave Bayer
bayer at cpw.math.columbia.edu
Sun Jul 22 19:53:49 EDT 2007
Here is another prime sieve. It is about half the length of the
fastest contributed code (ONeillPrimes) and nearly as fast until it
blows up on garbage collection:
> % cat ONeillPrimes.hs | grep -v "^--" | wc
> 185 1105 6306
> % cat BayerPrimes.hs | grep -v "^--" | wc
> 85 566 2418
>
> [Integer] -O 1*10^6 2*10^6 3*10^6 4*10^6 5*10^6
> ---------------------------------------------------------
> ONeillPrimes | 3.555 | 7.798 | 12.622 | 18.927 | 23.529
> BayerPrimes | 3.999 | 8.895 | 18.003 | 22.977 | 38.053
I wrote this as an experiment in coding data structures in Haskell's
lazy evaluation model, rather than as explicit data. The majority of
the work done by this code is done by "merge"; the multiples of each
prime percolate up through a tournament consisting of a balanced tree
of suspended "merge" function calls. In order to create an infinite
lazy balanced tree of lists, the datatype
> data List a = A a (List a) | B [a]
is used as scaffolding. One thinks of the root of the infinite tree
as starting at the leftmost child, and crawling up the left spine as
necessary.
What I'm calling a "venturi"
> venturi :: Ord a => [[a]] -> [a]
merges an infinite list of infinite lists into one list, under the
assumption that each list, and the heads of the lists, are in
increasing order. This could be a generally useful function. If one
can think of a better way to write "venturi", swapping in your code
would in particular yield a faster prime sieve.
I found that a tertiary merge tree was faster than a binary merge
tree, because this leads to fewer suspensions. One can speed this
code up a bit by interleaving strict and lazy calls, but I prefer to
leave the code short and readable.
It is bizarre that
> trim p = let f m x = mod x m /= 0 in filter (f p)
lurks in the prime sieve code, but it is only used with primes of
size up to the square root of the largest output prime. I tried more
thoughtful alternatives, and they all slowed down the sieve.
Sometimes dumb is beautiful.
Thanks to apfelmus for various helpful remarks that lead me to think
of this approach.
Here is the code:
> module BayerPrimes (venturi,primes) where
>
> -- Code for venturi :: Ord a => [[a]] -> [a]
>
> merge :: Ord a => [a] -> [a] -> [a] -> [a]
> merge xs@(x:xt) ys@(y:yt) zs@(z:zt)
> | x <= y = if x <= z
> then x : (merge xt ys zs)
> else z : (merge xs ys zt)
> | otherwise = if y <= z
> then y : (merge xs yt zs)
> else z : (merge xs ys zt)
> merge _ _ _ = undefined
>
> data List a = A a (List a) | B [a]
>
> mergeA :: Ord a => List a -> List a -> List a -> List a
> mergeA (A x xt) ys zs = A x (mergeA xt ys zs)
> mergeA (B xs) ys zs = mergeB xs ys zs
>
> mergeB :: Ord a => [a] -> List a -> List a -> List a
> mergeB xs@(x:xt) ys@(A y yt) zs = case compare x y of
> LT -> A x (mergeB xt ys zs)
> EQ -> A x (mergeB xt yt zs)
> GT -> A y (mergeB xs yt zs)
> mergeB xs (B ys) zs = mergeC xs ys zs
> mergeB _ _ _ = undefined
>
> mergeC :: Ord a => [a] -> [a] -> List a -> List a
> mergeC xs@(x:xt) ys@(y:yt) zs@(A z zt)
> | x < y = if x < z
> then A x (mergeC xt ys zs)
> else A z (mergeC xs ys zt)
> | otherwise = if y < z
> then A y (mergeC xs yt zs)
> else A z (mergeC xs ys zt)
> mergeC xs ys (B zs) = B $ merge xs ys zs
> mergeC _ _ _ = undefined
>
> root :: Ord a => List a -> [List a] -> [a]
> root (A x xt) yss = x : (root xt yss)
> root (B xs) (ys:zs:yst) = root (mergeB xs ys zs) yst
> root _ _ = undefined
>
> wrap :: [a] -> List a
> wrap [] = B []
> wrap (x:xt) = A x $ B xt
>
> triple :: Ord a => [List a] -> [List a]
> triple (x:y:z:xs) = mergeA x y z : (triple xs)
> triple _ = undefined
>
> group :: Ord a => [List a] -> [List a]
> group (x:y:xt) = x : y : (group $ triple xt)
> group _ = undefined
>
> venturi :: Ord a => [[a]] -> [a]
> venturi (x:xt) = root (wrap x) $ group $ map wrap xt
> venturi _ = undefined
>
> -- Code for primes :: Integral a => [a]
>
> diff :: Ord a => [a] -> [a] -> [a]
> diff xs@(x:xt) ys@(y:yt) = case compare x y of
> LT -> x : (diff xt ys)
> EQ -> (diff xt yt)
> GT -> (diff xs yt)
> diff _ _ = undefined
>
> trim :: Integral a => a -> [a] -> [a]
> trim p = let f m x = mod x m /= 0 in filter (f p)
>
> seed :: Integral a => [a]
> seed = [2,3,5,7,11,13,17]
>
> wheel :: Integral a => [a]
> wheel = drop 1 [ m*j + k | j <- [0..], k <- ws ]
> where m = foldr1 (*) seed
> ws = foldr trim [1..m] seed
>
> multiples :: Integral a => [a] -> [[a]]
> multiples ws = map fst $ tail $ iterate g ([], ws)
> where g (_,ps@(p:pt)) = ([ m*p | m <- ps ], trim p pt)
> g _ = undefined
>
> primes :: Integral a => [a]
> primes = seed ++ (diff wheel $ venturi $ multiples wheel)
More information about the Haskell-Cafe
mailing list