collections in hierarchical libraries

Tue Jan 13 20:01:07 EST 2004

G'day all.

Quoting JP Bernardy <jyp_7 at yahoo.com>:

> Rencent posts about DData and such make me think...
> Shouldn't it be integrated in the hierarchical
> libraries?

Possibly, possibly not.

DData is a really good set of data structures.  However, it does
duplicate some stuff which is already there.

> Besides, it looks like Edison has been dropped from
> GHC.

Daan has expressed some interest in unifying DData and Edison, which
isn't a bad idea.  Edison has been dropped because:

        - the version that was there isn't actively supported,
        - the version that _is_ actively supported isn't ready for
          "prime time" yet, and
        - nobody is entirely sure that Edison is quite what people
          want anyway.

> More generally, is there a plan for a unified
> collections framework in the hierarchical libraries?

No, but I would love some concrete suggestions.  Here's my attempt to
get the ball rolling...

1. There are a lot of things which are logically collections but existing
libraries do not consider as (e.g. PackedString, Array, Graph).  In a
truly unified library, these would be collections.

The downside is that fiddling with commonly used code isn't something
which should be undertaken lightly, and you'd better be certain that
what you're replacing it with is an improvement.

2. There is a lot of merit in Edison's approach where every data structure
supports every operation that is possible, even if it's inefficient.
Chris Okasaki did this because he often found himself, say, using a stack
ADT (which only supports push, pop, top, isEmpty etc) but occasionally
needing to peer at the second or third element from the top of the stack.
You don't want to have to change data structures just because of this.

The main unintended consequences of this are that a) the interface for
a data structure looks overwhelming, and b) the support for choosing
an appropriate data structure is quite poor.  Why would you choose a
LazyPairingHeap over a LeftistHeap anyway?  As a programmer, I just want
to pick a PriorityQueue and get one with reasonable performance
characteristics across the board, only going further if I need it.  This
has led many people to believe that Edison is too heavyweight.

3. Data types with destructive update semantics (some of them thread-safe,
some of them not) are becoming increasingly important.  A truly unified
approach should take this into account, easing the transition from a
data type without destructive update to one with as much as possible,
should a programmer find this is needed.  The unified approach to Arrays
is a good example of how this might work in practice.

It seems to me that would make sense to decree that all data structures
live in a monad which is naturally its "home".  This "home" might be the
identity monad for many data structures, and for those cases, a set of
additional functions would be provided which don't require the monad to
be passed.

4. "Proxy containers" are important.  On-disk data structures, in-memory
compressed representations and fake containers which compute elements
on the fly are all good examples of this.  To support these, it should
be straightforward (as much as possible) to turn your data structure into
a unified-collection-architecture-compliant collection.  In addition,
algorithms should be data-structure-agnostic if at all possible.

[Aside: The C++ STL is often touted as a good example of a data structure
library where algorithms are data-structure-agnostic, however, this is
only partly true.  Pretty much all proxy containers are not STL-compliant.
This has led to the odd situation where std::vector<int> is a container
but std::vector<bool> is not.]

The "left-fold-like combinator", which has received some press lately,
seems like a good place to start looking.

Cheers,
Andrew Bromage