Libraries need a new owner

Adrian Hey ahey at iee.org
Sun Nov 25 18:52:41 EST 2007


Hello Don,

Just a note to answer a few questions about what the plan is/was for
these libs, as I see it at least. Of course anyone taking it over is
free to set their own agenda :-)

The collections API was intended to provide a way for different
data structure implementations to look the same, so we could use
common test and benchmarking infrastructure and allow users to
switch from one to another without substantial code re-writes etc..

Data.COrdering is trivial, and only really exists for Data.Tree.AVL.
It was organised as a separate package as it is not dependent of AVL
trees and might possibly be useful elsewhere, despite its triviality.

Data.Tree.AVL is just an "uncommited" AVL tree lib. It's reasonably
mature and well tested and has a comprehensive but ugly and unsafe API.
It was designed with performance in mind, rather than elegance.

There are Data.Map and Data.Set (near) clones that put a safer gloss
on the raw AVL API. Last time I tested them they seem to offer some
performance gains of the "standard" Data.Map and Data.Set, but nothing
particularly dramatic unless you fall foul of "pathological" situations
in Data.Map/Set (the tree type used in these doesn't seem to perform
well with sorted insertions). Also, the Hedge algorithm is rather
expensive in terms of comparisons. It seems to have been designed on
the assumption that comparison was cheap and tree manipulation was
expensive. But with AVL trees tree manipulation is rather cheap and
the cost of comparison is unknown, so this uses a symetric divide and
conquer algorithm. This seems to perform rather well, even if comparison
is cheap (Ints). But the real advantage of the clones is that they
contain stuff that Data.Map/Set don't at present, like the "cheap
zipper" OMap type.

But...

All of the above is essentially obsolete as far as I'm concerned. The
real way ahead for efficient Maps for arbitrarily complex key structures
has to be Generalised Tries, which is where I've been putting most of
my effort. What's there at the moment is the Data.Trie.General package,
which defines the GT class and a few hand written instances thereof for
common prelude types (notable Lists,Ints and any Ord instance). This was
really an experiment to see what a "complete" (and in reality useable)
class API would look like and what efficiency problems real
implementations might have to address.

It's not really ready for prime time yet. The class API in probably
still incomplete and none of it has even been tested or benchmarked,
and (most importantly) there's not yet any way to produce instances
automatically (doing it by hand is an awful lot of work).

So what really needs doing AFAICS (and there's a lot for just me and
Jean-Philippe) is (at least) ..
* Write test and benchmark suites for various collections classes.
* Try to establish some common ground between collections classes and
   Edison.
* Redefine the Generalised Trie (GT) class to use those new
   fangled associated type thingies.
* Try to stabilise the Generalised Trie class (or classes) and resolve
   issues like the IMO useful but contentious Ord constraint.
* Provide some way of generating efficient instances automatically
   (using Template Haskell or DrIFT or Derive or whatever).
* Other stuff that I can't think of right now.

I would also like to add a note of caution for the benefit of anyone
taking this on. In principle Generalised Tries are simple enough for
sum and product types (e.g. the Hinze papers). But in reality things
are not so straight forward or simple if you're concerned with
efficiency. The hand written GT instance for Lists looks nothing like
what you'd get from the simple sum and product rules, for example
(for good reason).

Also, the Data.Trie.General.CollectionsInstances package is probably
complete garbage and should be ignored :-(

BTW, I haven't completely lost interest in this myself, it's just that
as with Jean-Philippe, I have other stuff I want to do and it also has
to be a part time effort for me, depending on what other commitments I
may have. So if galois wanted to put some effort into this that would be
great, as decent performing collections (especially maps) are quite
important I think.

Regards
--
Adrian Hey




More information about the Libraries mailing list