Proposal: IntMap.differenceKeysSet for removing an IntSet of keys

Johan Tibell johan.tibell at gmail.com
Thu Jun 16 17:38:02 CEST 2011


Hi,

Sorry for the late reply. I've been on vacation and had some catching
up to do the last few days.

On Mon, Jun 13, 2011 at 2:40 PM, Milan Straka <fox at ucw.cz> wrote:
> My point was, that there are many combinations you can think of --
> IntMap minus IntSet, IntMap minus Map Int, IntMap minus list of Ints,
> IntMap union Map Int, IntMap intersection IntSet, IntMap intersection
> Set Int and so on.
>
> With fromSet and fromDistinctAscList, you can handle all of these easily,
> although with some time penalty. For example the union case,
>  union map (fromSet (const 1) set)
> seems fine.

The containers library doesn't separate algorithms from data
structures, like e.g. STL does. As I wrote in another email it would
be nice if we could separate traversal of elements in containers from
the algorithms working on the elements (like difference).
Unfortunately I don't know how to do this.

> I am quite unhappy about exporting differenceKeysSet and other hybrid functions
> working on different data structures.
>
> But I do not like depending on rules alone either. If we promise that
> difference map (fromSet set) works without creating intermediate map,
> then it
> - works only for GHC
> - is nontrivial to use -- which functions do really fuse with fromSet?
>
> My biggest problem is "Does difference with IntSet deserves to have
> a special function implementing it?" I do not believe so. There are many
> functions that could fuse with fromSet, you could even implement better
> implementations of e.g. union with Map Int. But how do we measure which
> combinations are worth it?

We should measure the time difference, if it's small I don't think
it's worth it.

> For me, right now no functions working with different data structures
> are worth it. The API is complicated as it is and I do not feel adding
> cross-structure functions is a good thing to do. I am happy with
> being able to convert between the structures efficiently.
>
> Johan, what is your opinion?

I'm off the same mind.

On a more general note: The containers library has several problems:

* Lack of abstraction

You cannot write a function that works on different map implementations.

You cannot use a Map in a function that expects a Set even though Map
could have a O(1) conversion to a set view (most OO languages allow
this)

We cannot easily change the implementation. Data.Map requires the
implementation to be a weight balanced tree due to the presences of
functions that work on indexes. If we manage to write a faster map
using AVL trees we cannot use it to improve the performance of
Data.Map, even though a tiny minority of the users use the index-based
functions.

* Code duplication

There's lots of code duplication in the implementation. I don't know
how many times we write the same traversal just to do something
slightly different at the lead position.

There are algorithms that should work cross data structure (like
difference) but doesn't.



Unfortunately I don't know how to address either group of problems
well. Perhaps with associated data types we can finally write a type
class for Map and Set that works well. If we could get foldr/build
fusion to fuse left folds we could use lists to implement e.g.
difference efficiently by doing:

    difference s1 s2 = fromAscList . union (toAscList s1) (toAscList s2)

which would fused to a "loop" that traverses both sets at the same
time, building a new set as we go.

Cheers,
Johan



More information about the Libraries mailing list