Proposal: add Int indexing functions to Data.Set

Balazs Komuves bkomuves at gmail.com
Fri Apr 29 17:30:16 CEST 2011


However, indexing precludes the use of implementations that *don't*
> cache subtree sizes on a large fraction of interior node
>

The Data.Map API also precludes hash-map implementations, should we then
remove 90% of the functionality because of that?


> I'd support removing these functions absent a compelling use case (ie
> one that would not be better expressed as indexing by key).  Does
> anyone have one?
>


Does splitting at the middle for parallel processing qualify as a compelling
use case?
(though for that purpose maybe it would be better to expose splitting the
tree structure,
since it is balanced anyway, and also unsafely joining two trees).

My experience with the Map/Set API is that there are lots of use cases one
doesn't
think of in advance.

Balazs


On Fri, Apr 29, 2011 at 5:05 PM, Jan-Willem Maessen
<jmaessen at alum.mit.edu>wrote:

> On Fri, Apr 29, 2011 at 1:42 AM, Ivan Lazar Miljenovic
> <ivan.miljenovic at gmail.com> wrote:
> > On 29 April 2011 15:33, Luis Casillas <luis at casillas.org> wrote:
> >>
> >> El abr 28, 2011, a las 9:16 p.m., Ivan Lazar Miljenovic escribió:
> >>
> >>> On 29 April 2011 14:08, Luis Casillas <luis at casillas.org> wrote:
> >>>>
> >>>> 2. Concern that Int-based indexing breaks the set abstraction by
> exposing a specific ordering (Trac comment by Malcolm Wallace).  My
> assumption when I decided to submit this patch was that if it's OK for
> Data.Map to have these operations, then it should also be ok for Data.Set to
> have them.  But Malcolm's concern makes me wonder if some people will find
> the original Data.Map functions objectionable and dislike the idea of
> "spreading the infection" to Data.Set.
> >>>
> >>> I think I agree with Malcolm here; the internals of Data.Set shouldn't
> >>> be revealed IMHO.
> >>
> >> Would either of the following modifications make the proposal more
> palatable?
> >>
> >> 1. Stressing in the functions' documentation that the assignment of Ints
> does not have to reflect the order of the elements.  The examples would also
> be rewritten not to make any ordering assumptions.
> >>
> >> 2. Modifying the implementation to scramble the order of the indexes
> relative to the order of the set elements.  This would discourage people
> from implicitly coding to the element order.
> >>...
> >
> > Well, before we talk about modifications, etc.: is there a _need_ for
> > this kind of indexing in a Set?
>
> I believe there is not.  Moreover, in order to realize efficient Int
> indexing, we need to commit a word of storage to hold subtree size on
> every interior tree node in Data.Map, Data.Set, Data.IntSet, and
> Data.IntMap.  Without extra interior storage, these tree indexing and
> list indexing are comparably slow (O(n)).
>
> My understanding is these functions were included in Data.Map *because
> we could*: the implementation uses size-balanced trees, so the tree
> size had to be kept at each node for balancing purposes, and therefore
> it was easy to implement indexing.  As a happy side effect, this also
> makes it easy to maintain O(1) size information.
>
> However, indexing precludes the use of implementations that *don't*
> cache subtree sizes on a large fraction of interior nodes (it is
> already a bad match for IntSet and IntMap).  There are other
> techniques for making size O(1) without imposing per-node overhead, if
> that is desired.  It would be a shame to rule out alternative
> implementations in order to support an API that nobody needs.
>
> I'd support removing these functions absent a compelling use case (ie
> one that would not be better expressed as indexing by key).  Does
> anyone have one?
>
> -Jan-Willem Maessen
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20110429/5edde7a6/attachment.htm>


More information about the Libraries mailing list