[Haskell-beginners] sorting almost sorted list
Daniel Fischer
daniel.is.fischer at googlemail.com
Tue Sep 27 20:16:18 CEST 2011
On Tuesday 27 September 2011, 18:34:01, Yitzchak Gale wrote:
> Hi Daniel,
>
> Thanks for clearing up the mystery
> about the dependence on prime factorization!
> Very interesting.
>
> > at first glance, merging c
> > chunks of length m each is O(c*m); so, since sorting each chunk is
> > O(m*log m), it'd overall be
> >
> > O(c*m*log m) = O(n*log m) = O(n*log (n/c))
>
> No. In the given use case m is a small constant and
> c is proportional to n, so the overall complexity is O(n).
Not a contradiction. If m is a constant (> 1), O(n*log m) = O(n)
(and sorting chunks of length m is O(m*log m) = O(1) then).
>
> > It has larger overhead than sortBy,
>
> That's true. But it's asymptotically better.
>
> Given Dennis' use case, it sounds like
> m will probably always be < 10.
> Input size will typically be a few hundred,
> but could be a few thousand. sortBy
> would indeed be just fine in those cases.
>
> > so if n is small or m is large, sortBy
> > is likely to be better, but if n is large and m relatively small,
> > sortAlmostBy can be significantly (but not dramatically,
> > excluding extreme cases where you can do much better) faster.
>
> At most m will be on the order of 100 if we
> need to process all the parts of a large
> symphony orchestra combined together
> (a very unlikely use case admittedly),
> and in that case the input size could be as
> much as 10^6, or possibly even more for
> something like the Mahler.
Yeah, but log (10^6) is still a fairly small number, so counting reduction
steps, it wouldn't be dramatically better. However, I forgot to take memory
usage into account, so in terms of wall-clock (or CPU) time, it could be
dramatically better even for not too large n and no too small m.
>
> (Come to think of it, it's pretty amazing that
> Mahler wrote so many notes and actually got
> people to play them.)
People do stranger things. People listen to Wagner, for example.
>
> There I believe sortAlmostBy could be quite
> an improvement.
Certainly. (Just included Data.List.sort in the benchmark, for these short
lists, it takes less than twice as long as sortAlmostBy)
> I'd be interested to hear ideas to make it even better though.
mergeAABy :: (a -> a -> Ordering) -> [[a]] -> [a]
mergeAABy cmp ((x:xs):(y:ys):zss) = foo x xs y ys zss
where
foo u us v vs wss =
case cmp u v of
GT -> v : case vs of
(b:bs) -> foo u us b bs wss
[] -> u : us ++ mergeAABy cmp wss
_ -> u : case us of
(c:cs) -> foo c cs v vs wss
[] -> case wss of
((w:ws):kss) -> foo v vs w ws kss
_ -> v : vs
mergeAABy _ [xs] = xs
mergeAABy _ _ = []
seems to be a bit faster (I have made the lists longer, substituted 2000 by
5000):
dafis at schwartz:~/Haskell/BeginnersTesting> ./benchAlmost -nis 20 -kis 20
(20,20) -- print parameters to make sure they're evaluated
284090563 -- print sum of all random Ints to make sure they're evaluated
warming up
estimating clock resolution...
mean is 2.183860 us (320001 iterations)
found 41404 outliers among 319999 samples (12.9%)
2443 (0.8%) low severe
38961 (12.2%) high severe
estimating cost of a clock call...
mean is 50.65112 ns (14 iterations)
found 2 outliers among 14 samples (14.3%)
2 (14.3%) high mild
benchmarking Yitz
collecting 100 samples, 1 iterations each, in estimated 6.537795 s
mean: 67.05025 ms, lb 66.59363 ms, ub 67.62786 ms, ci 0.950
std dev: 2.616184 ms, lb 2.179318 ms, ub 3.100722 ms, ci 0.950
benchmarking Mine
collecting 100 samples, 1 iterations each, in estimated 5.566311 s
mean: 56.91069 ms, lb 56.54846 ms, ub 57.37602 ms, ci 0.950
std dev: 2.100948 ms, lb 1.722233 ms, ub 2.585056 ms, ci 0.950
With n = k = 50:
benchmarking Yitz
collecting 100 samples, 1 iterations each, in estimated 7.366776 s
mean: 75.63603 ms, lb 75.17202 ms, ub 76.20199 ms, ci 0.950
std dev: 2.617970 ms, lb 2.227238 ms, ub 3.040563 ms, ci 0.950
benchmarking Mine
collecting 100 samples, 1 iterations each, in estimated 6.517696 s
mean: 68.00708 ms, lb 67.26089 ms, ub 69.04105 ms, ci 0.950
std dev: 4.463628 ms, lb 3.484169 ms, ub 5.950722 ms, ci 0.950
n = k = 100:
benchmarking Yitz
collecting 100 samples, 1 iterations each, in estimated 9.756303 s
mean: 87.39906 ms, lb 86.90397 ms, ub 87.99232 ms, ci 0.950
std dev: 2.763955 ms, lb 2.382008 ms, ub 3.168909 ms, ci 0.950
benchmarking Mine
collecting 100 samples, 1 iterations each, in estimated 7.505393 s
mean: 77.05801 ms, lb 76.51954 ms, ub 77.74727 ms, ci 0.950
std dev: 3.117756 ms, lb 2.594126 ms, ub 3.769922 ms, ci 0.950
More information about the Beginners
mailing list