[Haskell-cafe] Re: "no-coding" functional data structures
via lazyness
Dave Bayer
bayer at math.columbia.edu
Mon Jul 16 02:55:51 EDT 2007
apfelmus <apfelmus <at> quantentunnel.de> writes:
> While your observation that merge may create an implicit heap is true,
> it doesn't happen in your code :) When unfolding the foldr1, we get
> something like
>
> 2:.. `merge'` (3:.. `merge'` (5:.. `merge1` (...)))
>
> i.e. just a linear chain of merges. Retrieving the least element is
> linear time in the worst case. This shape will not change with
> subsequent reductions of merge. In other words, it's the responsibility
> of fold to build a heap. Mergesort shows how a fold can build a heap:
>
> http://thread.gmane.org/gmane.comp.lang.haskell.general/15007
>
> For primes , the heap shape has to be chosen carefully in order to
> ensure termination. It's the same problem that forces you to use foldr1
> merge' instead of foldr1 merge .
>
> There's also a long thread about prime sieves
>
> http://thread.gmane.org/gmane.comp.lang.haskell.cafe/19699
Indeed. Your answer sent my head spinning, giving me something to think about
on a flight AMS to SFO. Thanks!
Here is a prime sieve that can hang within a factor of two of the fastest
code in that thread, until it blows up on garbage collection:
-----------------------------------------------------------------
diff :: Ord a => [a] -> [a] -> [a]
diff xs@(x:xt) ys@(y:yt) = case compare x y of
LT -> x : (diff xt ys)
EQ -> (diff xt yt)
GT -> (diff xs yt)
diff _ _ = undefined
union :: Ord a => [a] -> [a] -> [a]
union xs@(x:xt) ys@(y:yt) = case compare x y of
LT -> x : (union xt ys)
EQ -> x : (union xt yt)
GT -> y : (union xs yt)
union _ _ = undefined
twig :: Ord a => [a] -> [a] -> [a]
twig (x:xt) ys = x : (union xt ys)
twig _ _ = undefined
pair :: Ord a => [[a]] -> [[a]]
pair (x:y:xs) = twig x y : (pair xs)
pair _ = undefined
tree :: Ord a => [[a]] -> [a]
tree xs =
let g (x:xt) = x : (g $ pair xt)
g _ = undefined
in foldr1 twig $ g xs
seed :: Integral a => [a]
seed = [2,3,5,7,11,13]
wheel :: Integral a => [a]
wheel = drop 1 [ 30*j+k | j <- [0..], k <- [1,7,11,13,17,19,23,29] ]
multiples :: Integral a => [a]
multiples = tree ps
where f p n = mod n p /= 0
g (_,ns) p = ([ n*p | n <- ns ], filter (f p) ns)
ps = map fst $ tail $ scanl g ([], wheel) $ drop 3 primes
primes :: Integral a => [a]
primes = seed ++ (diff (drop 3 wheel) multiples)
-----------------------------------------------------------------
Here are some timings:
[Integer] -O 10^4 10^5 10^6 10^7
-----------------------------------------------------------------
ONeillPrimes | 0m0.023s | 0m0.278s | 0m3.682s | 0m53.920s
primes | 0m0.022s | 0m0.341s | 0m5.664s | 8m12.239s
This differs from your code in that it works with infinite lists, so
it can't build a balanced tree; the best it can do is to build a vine
of subtrees that double in size.
My conclusion so far from this and other experiments is that pushing
data structures into the lazy evaluation model does make them run faster,
but at the expense of space, which eventually leads to the code's untimely
demise.
I can imagine a lazy functional language that would support reification
of suspended closures, so one could incrementally balance the suspended
computation above. As far as I can tell, Haskell is not such a language.
I'd love it, however, if someone could surprise me by showing me the
idiom I'm missing here.
I will post a version of this code (I have faster but less readable
versions) to the prime sieve thread. First, I'm waiting for
the other shoe to drop, I still feel like I'm missing something.
More information about the Haskell-Cafe
mailing list