[Haskell] Re: Quicksearch vs. lazyness
apfelmus
apfelmus at quantentunnel.de
Mon Mar 19 10:19:34 EDT 2007
Steffen Mazanek wrote:
>
> say, we want to find the k'th element in a sorted list.
I assume that you want to find the k'th smallest element of an unsorted
list by sorting it?
> [...]
>
> However, I was wondering whether it might be possible
> to implement quicksort in a way that quicksearch is
> done out of the box by lazy evaluation. Is there any way
> to do this?
Yes, that can be done. It's a small extension of the folklore (?) example
minimum = head . mergesort
that extracts the smallest element of a list. Given a proper mergesort,
laziness will ensure that this function actually runs in O(n) time
instead of the expected O(n log n). Apparently, the first k smallest
elements now can be obtained by
take k . mergesort
which may run in O(n + k*log n) time with a laziness-friendly mergesort.
> From my understanding for small k's lazy
> evaluation already does the trick for the naive quicksort
> algorithm (quicksort smaller ++ [x] ++ quicksort greater),
> doesn't it? Is there a search algorithm that makes better
> use of lazy evaluation out of the box?
No, as far as I thought about quicksort, it needs O(n log n) to produce
the first element. But even the mergesort implementation has to be
chosen carefully as I've seen many that still need O(n log n) just to
return the smallest element.
Alas, her is a laziness-friendly mergesort:
mergesort [] = []
mergesort xs = foldtree1 merge $ map return xs
foldtree1 f [x] = x
foldtree1 f xs = foldtree1 f $ pairs xs
where
pairs [] = []
pairs [x] = [x]
pairs (x:x':xs) = f x x' : pairs xs
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) =
if x <= y then x:merge xs (y:ys) else y:merge (x:xs) ys
Why does this work? The function 'foldtree1' folds the elements of a
list as if they where in a binary tree:
foldrtree1 f [1,2,3,4,5,6,7,8]
==>
((1 `f` 2) `f` (3 `f` 4)) `f` ((5 `f` 6) `f` (7 `f` 8))
The important point is that this expression is constructed in O(n + n/2
+ n/4 + n/8 + ..) = O(n) time. Now, given such a binary tree of `merge`
operations, getting the next list element of the result takes O(log n)
time. Why? Because the merge in the middle gets the next element either
from its left or its right subexpression which in turn gets its element
from its left or its right subexpression and so on. Clearly, we only
need logarithmically many steps to hit the bottom. Putting both
together, we see that we have to pay O(n + k*log n) steps to build the
expression and to fetch the first k elements.
Making this argument rigorous with a suitable debit invariant is left to
the attentive reader :)
Regards,
apfelmus
More information about the Haskell
mailing list