Proposal: add Int indexing functions to Data.Set

Fri Apr 29 18:30:24 CEST 2011

El abr 29, 2011, a las 8:05 a.m., Jan-Willem Maessen escribió:

> I'd support removing these functions absent a compelling use case (ie
> one that would not be better expressed as indexing by key).  Does
> anyone have one?

Well, I have a use case for these functions in the case of Set.  I'm a bit wary of arguing for the inclusion on nothing but my personal use case, because as I mentioned in another message, I just assumed that desirability of such functions functions in Data.Map would be uncontroversial.  I.e., when it comes to modifying the interface of a standard library, I think arguments like "make it uniform with Data.Map" are more persuasive than "I could use it for this."

At this point, I'm slowly being convinced that the original Data.Map functions are extraneous.  Still, I will mention my use case for the Data.Set functions, and let people make up their own minds.

I'm working on a project where the most natural implementation is to use the members of runtime-constructed sets to index the elements of an array.  The use of sets to represent the index sets is independently motivated, because I need to operate on these sets using operations like union, intersection, difference, mapping and filtering, while preserving the member uniqueness property.

The lookupIndex, findIndex and elemAt functions would give me this set member/array index translation "for free."  There are other ways to construct such a mapping, but I think they suffer in comparison:

1. I could construct a Map assigning an index to each element, but then the inverse mapping is linear time.

2. That could be solved by using a pair of maps (one from elem to Int, and an inverse map from Int to elem), but since I fundamentally just want to treat these maps as sets, I need to implement analogues of basic Set operations like union, intersection, difference, map, filter and so on, that would need to reconstruct the element <--> index mappings as needed.  I realize that this is not difficult at all, but (a) it's presumably slower, (b) again, I mistakenly assumed that the Data.Map index functions were uncontroversial to start with. 

3. An alternative would be to implement and use a data type that guaranteed uniqueness, ordering, Int indexing and the usual set operations.  You might call such a type or class OrderedSet or IndexedSet; but to implement it, you might end up just copy-pasting Data.Set + the indexing functions as the implementation, because the implementation in Data.Set sure looks like a natural way of implementing an ordered, indexed set.

I think this really distills the issue down to this.  There is a family of three potential, related abstractions:

                         plain set
                           /   \
                          /     \
                ordered set     indexed set

A plain set just provides uniqueness.  An ordered set provides uniqueness and preserves some ordering predicate on the elements.  An indexed set assigns a unique Int index in the range [0..size-1] to each element, and provides translation between members and indexes.

The Set module provides an implementation of an efficient ordered set, that can easily be extended to also be an indexed set.  But many people think that "plain set" should be the only contract that Data.Set is bound to implement.  I can't say that I disagree with that--but I also just want indexed sets...

A more ambitious proposal would be this:

1. Introduce classes to represent the three abstractions in question.

2. Have Data.Set be an instance of all three.

3. Don't expose the ordered an indexed operations directly from Data.Set; force the users to go through the appropriate class.  This allows flexibility for Data.Set to later switch to an implementation that doesn't provide order or indexes; the original implementation would have to be kept as an alternative for people who want the extra features.

This is a lot more ambitious of a proposal than I was hoping for my first time around, I will say.