[Haskell-cafe] Re: "no-coding" functional data structures via lazyness

Mon Jul 16 02:55:51 EDT 2007

apfelmus <apfelmus <at> quantentunnel.de> writes:

> While your observation that merge may create an implicit heap is true,
> it doesn't happen in your code :) When unfolding the foldr1, we get
> something like
> 
>   2:.. `merge'` (3:.. `merge'` (5:.. `merge1` (...)))
> 
> i.e. just a linear chain of merges. Retrieving the least element is
> linear time in the worst case. This shape will not change with
> subsequent reductions of  merge. In other words, it's the responsibility
> of  fold  to build a heap. Mergesort shows how a fold can build a heap:
> 
>   http://thread.gmane.org/gmane.comp.lang.haskell.general/15007
> 
> For  primes , the heap shape has to be chosen carefully in order to
> ensure termination. It's the same problem that forces you to use  foldr1
> merge'  instead of  foldr1 merge .
> 
> There's also a long thread about prime sieves
> 
>   http://thread.gmane.org/gmane.comp.lang.haskell.cafe/19699

Indeed. Your answer sent my head spinning, giving me something to think about
on a flight AMS to SFO. Thanks!

Here is a prime sieve that can hang within a factor of two of the fastest
code in that thread, until it blows up on garbage collection:

-----------------------------------------------------------------

diff  :: Ord a => [a] -> [a] -> [a]
diff xs@(x:xt) ys@(y:yt) = case compare x y of
    LT -> x : (diff  xt ys)
    EQ ->     (diff  xt yt)
    GT ->     (diff  xs yt)
diff _ _ = undefined

union :: Ord a => [a] -> [a] -> [a]
union xs@(x:xt) ys@(y:yt) = case compare x y of
    LT -> x : (union xt ys)
    EQ -> x : (union xt yt)
    GT -> y : (union xs yt)
union _ _ = undefined

twig :: Ord a => [a] -> [a] -> [a]
twig (x:xt) ys = x : (union xt ys)
twig _ _ = undefined

pair :: Ord a => [[a]] -> [[a]]
pair (x:y:xs) = twig x y : (pair xs)
pair _ = undefined

tree :: Ord a => [[a]] -> [a]
tree xs  = 
    let g (x:xt) = x : (g $ pair xt)
        g _ = undefined
    in  foldr1 twig $ g xs

seed :: Integral a => [a]
seed = [2,3,5,7,11,13]

wheel :: Integral a => [a]
wheel  = drop 1 [ 30*j+k | j <- [0..], k <- [1,7,11,13,17,19,23,29] ]

multiples :: Integral a => [a]
multiples = tree ps
    where f p n = mod n p /= 0
          g (_,ns) p = ([ n*p | n <- ns ], filter (f p) ns)
          ps = map fst $ tail $ scanl g ([], wheel) $ drop 3 primes

primes :: Integral a => [a]
primes = seed ++ (diff (drop 3 wheel) multiples)

-----------------------------------------------------------------

Here are some timings:

[Integer] -O       10^4        10^5        10^6        10^7
-----------------------------------------------------------------
ONeillPrimes  |  0m0.023s |  0m0.278s |  0m3.682s | 0m53.920s
      primes  |  0m0.022s |  0m0.341s |  0m5.664s | 8m12.239s

This differs from your code in that it works with infinite lists, so
it can't build a balanced tree; the best it can do is to build a vine
of subtrees that double in size.

My conclusion so far from this and other experiments is that pushing
data structures into the lazy evaluation model does make them run faster,
but at the expense of space, which eventually leads to the code's untimely
demise.

I can imagine a lazy functional language that would support reification
of suspended closures, so one could incrementally balance the suspended
computation above. As far as I can tell, Haskell is not such a language.
I'd love it, however, if someone could surprise me by showing me the
idiom I'm missing here.

I will post a version of this code (I have faster but less readable
versions) to the prime sieve thread. First, I'm waiting for
the other shoe to drop, I still feel like I'm missing something.