[Haskell-cafe] hstats median algorithm

timothyhobbs at seznam.cz timothyhobbs at seznam.cz
Mon Sep 3 03:07:35 CEST 2012


Sorry, I am horribly mistaken.  Hash table is O(n)+O(numbuckets)+O
(middlebucketsize log(middlebucketsize))...  It's too late for this stuff...

Tim


---------- Původní zpráva ----------
Od: timothyhobbs at seznam.cz
Datum: 3. 9. 2012
Předmět: Re: [Haskell-cafe] hstats median algorithm
"

It really depends on how you are reading in the data and what you plan to do
with it besides taking the median.  Obviously, if you read in your data as 
an ordered list things can be done O(n) without any trouble.




In another case, if you already know the range, you can make a hash table 
and start at the middle bucket and move outwards.  That will be O(n) + O
(bucketsize log(bucketsize)) given that the middle bucket is non empty and 
I'm not horribly mistaken.  





Tim





---------- Původní zpráva ----------
Od: David Feuer <david.feuer at gmail.com>
Datum: 3. 9. 2012
Předmět: Re: [Haskell-cafe] hstats median algorithm
"

I was thinking it should offer a randomized version (taking a generator), 
since randomized median algorithms provide the best expected performance. It
could also offer a deterministic version using some variant of median-of-
medians, intended for long lists. I guess it probably should retain the 
naive version for short lists. Some benchmarking would suggest a good 
cutoff. Has anyone come up with a better practical deterministic O(n) 
algorithm since median-of-medians? I saw a paper by Dorit Dor on reducing 
the number of comparisons to a bit under 3n, which also showed a lower bound
of a bit over 2n, but the algorithm she gives strikes me as far too complex 
to be practical.

On Sep 1, 2012 9:17 PM, "Gershom Bazerman" <gershomb at gmail.com
(mailto:gershomb at gmail.com)> wrote:
" In my experience, doing much better than the naive algorithm for median is
surprisingly hard, and involves a choice from a range of trade-offs. Did you
have a particular better algorithm in mind?

If you did, you could write it, and contact the package author with a patch.

You also may be able to find something of use in Edward Kmett's order-
statistics package: http://hackage.haskell.org/package/order-statistics
(http://hackage.haskell.org/package/order-statistics)

Cheers,
Gershom

On 9/1/12 3:26 PM, David Feuer wrote:
" The median function in the hstats package uses a naive O(n log n)
algorithm. Is there another package providing an O(n) option? If not,
what would it take to get the package upgraded?

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe at haskell.org(mailto:Haskell-Cafe at haskell.org)
http://www.haskell.org/mailman/listinfo/haskell-cafe
(http://www.haskell.org/mailman/listinfo/haskell-cafe)
" 
"

"
"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120903/9d70443f/attachment.htm>


More information about the Haskell-Cafe mailing list