[Haskell-cafe] Re: "no-coding" functional data structures via
lazyness
apfelmus
apfelmus at quantentunnel.de
Wed Jul 18 04:59:39 EDT 2007
Dave Bayer wrote:
> Here is a prime sieve that can hang within a factor of two of the fastest
> code in that thread, until it blows up on garbage collection:
>
> -----------------------------------------------------------------
>
> diff :: Ord a => [a] -> [a] -> [a]
> diff xs@(x:xt) ys@(y:yt) = case compare x y of
> LT -> x : (diff xt ys)
> EQ -> (diff xt yt)
> GT -> (diff xs yt)
> diff _ _ = undefined
>
> union :: Ord a => [a] -> [a] -> [a]
> union xs@(x:xt) ys@(y:yt) = case compare x y of
> LT -> x : (union xt ys)
> EQ -> x : (union xt yt)
> GT -> y : (union xs yt)
> union _ _ = undefined
>
> twig :: Ord a => [a] -> [a] -> [a]
> twig (x:xt) ys = x : (union xt ys)
> twig _ _ = undefined
>
> pair :: Ord a => [[a]] -> [[a]]
> pair (x:y:xs) = twig x y : (pair xs)
> pair _ = undefined
>
> tree :: Ord a => [[a]] -> [a]
> tree xs =
> let g (x:xt) = x : (g $ pair xt)
> g _ = undefined
> in foldr1 twig $ g xs
>
> This differs from your code in that it works with infinite lists, so
> it can't build a balanced tree; the best it can do is to build a vine
> of subtrees that double in size.
Yes, the shape of the implicit tree has to be known in advance, there's
no way to change it dynamically. But there's no need to balance it
perfectly as long as access to a leaf takes only logarithmic time. So,
the function tree is fine. I'd even turn it into a higher-order function
foldInfTree1 :: (a -> a -> a) -> [a] -> a
foldInfTree1 f xs = foldr1 f $ deepen xs
where
pairs [] = []
pairs [x] = [x]
pairs (x:x':xs) = f x x' : pairs xs
deepen [] = []
deepen (x:xs) = x : deepen (pairs xs)
In case of an infinite list, the resulting tree of `f`s has an infinite
right spine but every other path is finite. Moreover, the length of a
path to the n-th list element is bounded by something like 2*log n. With
this higher-order function, your tree becomes
tree = foldInfTree1 twig
But I'm not sure whether this tree structure really works well for
infinite lists, see my remark below.
> seed :: Integral a => [a]
> seed = [2,3,5,7,11,13]
>
> wheel :: Integral a => [a]
> wheel = drop 1 [ 30*j+k | j <- [0..], k <- [1,7,11,13,17,19,23,29] ]
>
> primes :: Integral a => [a]
> primes = seed ++ (diff (drop 3 wheel) multiples)
>
> multiples :: Integral a => [a]
> multiples = tree ps
> where f p n = mod n p /= 0
> g (_,ns) p = ([ n*p | n <- ns ], filter (f p) ns)
> ps = map fst $ tail $ scanl g ([], wheel) $ drop 3 primes
Hm, this looks very suspicious, I guess there's something wrong with
using scanl g . You filter out multiples that are divisible by prior
primes, but that should be the job of the heap. In other words, the
filter (f p) is the core of the algorithm here, making it almost
equivalent to the simple
sieve xs p = filter (\n -> n `mod` p /= 0) xs
primes = map head $ scanl sieve [2..] primes
The heap is not needed at all. In fact, it may even be the second reason
for the memory consumption here. To see why, lets draw the structure of
the tree with parentheses
1 (2 3) ((4 5) (6 7)) (((8 9) (10 11)) ((12 13) (14 15))) ...
Every pair inside a parenthesis is meant to be merged with twig , it's
just too noisy to write every twig explicitly. Also, I left out the
outermost chain of parenthesis implied by the foldr . Now, as soon as
the twig on ((8 9) (10 11)) and ((12 13) (14 15)) changes into a
union , the twig between (12 13) and (14 15) will be calculated and
compared against the remaining (9 `union` (10 `union` 11)). But
evaluating the 12-th is to soon at this stage since 9,10 and 11 are
surely smaller, the sequence of primes is monotone. Unfortunately, this
gap widens, so that you need to evaluate the (2^k+2^(k-1))-th prime when
the (2^k+1)-th prime would be next.
In the end, it seems that this tree structure doesn't work well on stuff
that is somewhat monotone. I guess that you'll run into problems with
termination as soon as you remove the filter (f p) .
Besides perhaps termination, I guess that your reason for applying
filter (f p) repeatedly was to start the wheel at the right position.
Normally, the multiples would just be
multiples = tree $ map multiple primes
multiple p = map (p*) [p..]
But given that we could start roll the wheel starting from p, the list
of factors can be reduced dramatically
multiple p = map (p*) $ wheel `rollFrom` p
This can be done by representing the wheel differently:
-- Wheel (modulus) (list of remainders)
data Wheel = Wheel Int [Int]
wheel30 = Wheel 30 [1,7,11,13,17,19,23,29]
(Wheel n rs) `rollFrom` k = map (k+) $ differences
$ until (\rs -> k `mod` n == head rs `mod` n) tail (cycle rs)
where
differences xs = zipWith subtract' xs (tail xs)
subtract' x y = (y - x) `mod` n
> I can imagine a lazy functional language that would support reification
> of suspended closures, so one could incrementally balance the suspended
> computation above. As far as I can tell, Haskell is not such a language.
> I'd love it, however, if someone could surprise me by showing me the
> idiom I'm missing here.
Well, you can "reify" things by using constructors in the first place
data Heap a = One a | Merge (Heap a) (Heap a)
foldHeap = foldTree Merge . map Leaf
and operating on the resulting tree afterwards. But otherwise inspecting
the term structure of a closure is not possible since that would destroy
referential transparency.
Regards,
apfelmus
More information about the Haskell-Cafe
mailing list