[Haskell-cafe] Re: "no-coding" functional data structures via lazyness

Wed Jul 18 04:59:39 EDT 2007

Dave Bayer wrote:
> Here is a prime sieve that can hang within a factor of two of the fastest
> code in that thread, until it blows up on garbage collection:
>
> -----------------------------------------------------------------
> 
> diff  :: Ord a => [a] -> [a] -> [a]
> diff xs@(x:xt) ys@(y:yt) = case compare x y of
>     LT -> x : (diff  xt ys)
>     EQ ->     (diff  xt yt)
>     GT ->     (diff  xs yt)
> diff _ _ = undefined
> 
> union :: Ord a => [a] -> [a] -> [a]
> union xs@(x:xt) ys@(y:yt) = case compare x y of
>     LT -> x : (union xt ys)
>     EQ -> x : (union xt yt)
>     GT -> y : (union xs yt)
> union _ _ = undefined
> 
> twig :: Ord a => [a] -> [a] -> [a]
> twig (x:xt) ys = x : (union xt ys)
> twig _ _ = undefined
> 
> pair :: Ord a => [[a]] -> [[a]]
> pair (x:y:xs) = twig x y : (pair xs)
> pair _ = undefined
> 
> tree :: Ord a => [[a]] -> [a]
> tree xs  = 
>     let g (x:xt) = x : (g $ pair xt)
>         g _ = undefined
>     in  foldr1 twig $ g xs
>
> This differs from your code in that it works with infinite lists, so
> it can't build a balanced tree; the best it can do is to build a vine
> of subtrees that double in size.

Yes, the shape of the implicit tree has to be known in advance, there's
no way to change it dynamically. But there's no need to balance it
perfectly as long as access to a leaf takes only logarithmic time. So,
the function  tree  is fine. I'd even turn it into a higher-order function

  foldInfTree1 :: (a -> a -> a) -> [a] -> a
  foldInfTree1 f xs  = foldr1 f $ deepen xs
     where
     pairs []        = []
     pairs [x]       = [x]
     pairs (x:x':xs) = f x x' : pairs xs

     deepen []       = []
     deepen (x:xs)   = x : deepen (pairs xs)

In case of an infinite list, the resulting tree of `f`s has an infinite
right spine but every other path is finite. Moreover, the length of a
path to the n-th list element is bounded by something like 2*log n. With
this higher-order function, your  tree  becomes

  tree = foldInfTree1 twig

But I'm not sure whether this tree structure really works well for
infinite lists, see my remark below.

> seed :: Integral a => [a]
> seed = [2,3,5,7,11,13]
>
> wheel :: Integral a => [a]
> wheel  = drop 1 [ 30*j+k | j <- [0..], k <- [1,7,11,13,17,19,23,29] ]
>
> primes :: Integral a => [a]
> primes = seed ++ (diff (drop 3 wheel) multiples)
>
> multiples :: Integral a => [a]
> multiples = tree ps
>     where f p n = mod n p /= 0
>           g (_,ns) p = ([ n*p | n <- ns ], filter (f p) ns)
>           ps = map fst $ tail $ scanl g ([], wheel) $ drop 3 primes

Hm, this looks very suspicious, I guess there's something wrong with
using  scanl g . You filter out multiples that are divisible by prior
primes, but that should be the job of the heap. In other words, the
filter (f p)  is the core of the algorithm here, making it almost
equivalent to the simple

  sieve xs p = filter (\n -> n `mod` p /= 0) xs
  primes = map head $ scanl sieve [2..] primes

The heap is not needed at all. In fact, it may even be the second reason
for the memory consumption here. To see why, lets draw the structure of
the tree with parentheses

  1 (2 3) ((4 5) (6 7)) (((8 9) (10 11)) ((12 13) (14 15))) ...

Every pair inside a parenthesis is meant to be merged with  twig , it's
just too noisy to write every  twig  explicitly. Also, I left out the
outermost chain of parenthesis implied by the  foldr . Now, as soon as
the  twig  on ((8 9) (10 11)) and ((12 13) (14 15)) changes into a
union , the  twig  between (12 13) and (14 15) will be calculated and
compared against the remaining (9 `union` (10 `union` 11)). But
evaluating the 12-th is to soon at this stage since 9,10 and 11 are
surely smaller, the sequence of primes is monotone. Unfortunately, this
gap widens, so that you need to evaluate the (2^k+2^(k-1))-th prime when
the (2^k+1)-th prime would be next.

In the end, it seems that this tree structure doesn't work well on stuff
that is somewhat monotone. I guess that you'll run into problems with
termination as soon as you remove the  filter (f p) .

Besides perhaps termination, I guess that your reason for applying
filter (f p)  repeatedly was to start the wheel at the right position.
Normally, the multiples would just be

  multiples  = tree $ map multiple primes
  multiple p = map (p*) [p..]

But given that we could start roll the wheel starting from p, the list
of factors can be reduced dramatically

  multiple p = map (p*) $ wheel `rollFrom` p

This can be done by representing the wheel differently:

    -- Wheel (modulus) (list of remainders)
  data Wheel   = Wheel Int [Int]

  wheel30 = Wheel 30 [1,7,11,13,17,19,23,29]

  (Wheel n rs) `rollFrom` k = map (k+) $ differences
     $ until (\rs -> k `mod` n == head rs `mod` n) tail (cycle rs)
     where
     differences xs = zipWith subtract' xs (tail xs)
     subtract' x y  = (y - x) `mod` n

> I can imagine a lazy functional language that would support reification
> of suspended closures, so one could incrementally balance the suspended
> computation above. As far as I can tell, Haskell is not such a language.
> I'd love it, however, if someone could surprise me by showing me the
> idiom I'm missing here.

Well, you can "reify" things by using constructors in the first place

  data Heap a = One a | Merge (Heap a) (Heap a)

  foldHeap = foldTree Merge . map Leaf

and operating on the resulting tree afterwards. But otherwise inspecting
the term structure of a closure is not possible since that would destroy
referential transparency.

Regards,
apfelmus