[Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

Fri Feb 25 00:40:57 CET 2011

On 2/23/11 8:06 PM, wren ng thornton wrote:
> On 2/23/11 4:42 PM, Sterling Clover wrote:
>> A quick grep of some of my own source reveals that I've used M.size
>> and S.size only to test for sizes equal to 1. So, for my purposes at
>> least, an O(1) isSingleton operation would be just as useful as an
>> O(1) size.
>
> I agree, a fast isSingleton function would cover a very common use
> case--- especially for set-like containers.

On reflection, in order to handle all the comparisons of size to some 
small constant, it would be better to have two functions: one for 
comparing the size of a collection against a boundary condition, and one 
for comparing the size of two collections. That is, you should add 
analogues of the functions in Data.List.Extras.LazyLength[1].

The lazy boundary condition function is O(min(m,n)) where m is the size 
of the boundary and n is the size of the collection. For detecting 
singletons, doubletons, etc, this reduces to O(1) though there may be a 
constant factor hit vs dedicated isSingleton/isDoubleton/et functions. 
It would be a slight difference that diminishes as m increases, but it 
could be worth benchmarking...

Similarly, the lazy comparative size function is O(min(m,n)) where m and 
n are the sizes of the two collections. This is always less than the 
O(m)+O(n) of taking the sizes individually[2], but it's remarkably less 
when one of the collections is known to be much smaller than the other 
(even if you don't know which is which).

[1] 
http://hackage.haskell.org/packages/archive/list-extras/0.4.0.1/doc/html/Data-List-Extras-LazyLength.html

[2] The O(m)+O(n) can be parallelized to yield O(max(m,n)) but that's 
still greater than O(min(m,n)). With some form of chunked-lazy natural 
numbers you may be able to get the parallel version to approach 
O(max(m,n)) and then get into a battle of constant factors.

-- 
Live well,
~wren