[Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
wren ng thornton
wren at freegeek.org
Fri Feb 25 00:40:57 CET 2011
On 2/23/11 8:06 PM, wren ng thornton wrote:
> On 2/23/11 4:42 PM, Sterling Clover wrote:
>> A quick grep of some of my own source reveals that I've used M.size
>> and S.size only to test for sizes equal to 1. So, for my purposes at
>> least, an O(1) isSingleton operation would be just as useful as an
>> O(1) size.
> I agree, a fast isSingleton function would cover a very common use
> case--- especially for set-like containers.
On reflection, in order to handle all the comparisons of size to some
small constant, it would be better to have two functions: one for
comparing the size of a collection against a boundary condition, and one
for comparing the size of two collections. That is, you should add
analogues of the functions in Data.List.Extras.LazyLength.
The lazy boundary condition function is O(min(m,n)) where m is the size
of the boundary and n is the size of the collection. For detecting
singletons, doubletons, etc, this reduces to O(1) though there may be a
constant factor hit vs dedicated isSingleton/isDoubleton/et functions.
It would be a slight difference that diminishes as m increases, but it
could be worth benchmarking...
Similarly, the lazy comparative size function is O(min(m,n)) where m and
n are the sizes of the two collections. This is always less than the
O(m)+O(n) of taking the sizes individually, but it's remarkably less
when one of the collections is known to be much smaller than the other
(even if you don't know which is which).
 The O(m)+O(n) can be parallelized to yield O(max(m,n)) but that's
still greater than O(min(m,n)). With some form of chunked-lazy natural
numbers you may be able to get the parallel version to approach
O(max(m,n)) and then get into a battle of constant factors.
More information about the Haskell-Cafe