Who needs Ord for Sets and Maps anyway?
ahey at iee.org
Mon Jan 16 03:34:31 EST 2006
On Wednesday 11 Jan 2006 7:09 am, Adrian Hey wrote:
> The FGL implementation seems to do the same thing. So I guess
> lookup performance might not be too good either, at least for
> non-trivial comparisons.
Well, speaking of "non-trivial comparisons" reminds me of something
that's been worrying me about this whole approach to implementing
Sets and Maps. I.E. Requiring that element/key types are instances
of Ord and assuming that lookup etc is achieved by performing some
kind of binary search.
This seems wrong in general. E.G. A trie based implementation of
ListSet/ListMap does not require that lists are instances of Ord.
Furthermore, a trie based implementation for Lists is likely
to be very much more efficient than the proposed polymorphic
Sets/Maps based on balanced binary trees, because it avoids
a lot of repeated comparison on the same list element.
What I really want is Sets/Maps that work efficiently where the
element/keys can be arbitrarily complex data structures, so
comparison is non-trivial and typically quite expensive.
(I guess we need to restrict "arbitrarily complex" to
exclude cyclic data structures.)
So couldn't we generalise the idea of tries to work with any data type?
Suppose I wanted a Set or Map of pairs. The obvious implementation
(using current polymorphic Sets/Maps) would be..
-- Set of pairs (a,b)
newtype Set2 a b = Set2 (Map a (Set b))
-- Map of pairs (a,b) to values v
newtype Map2 a b v = Map2 (Map a (Map b v))
.. and for triples ..
-- Set of triples (a,b,c)
newtype Set3 a b c = Set3 (Map a (Set2 b c))
-- or perhaps ..
newtype Set3 a b c = Set3 (Map2 a b (Set c))
-- Map of triples (a,b,c) to values v
newtype Map3 a b c v = Map3 (Map a (Map2 b c v))
-- or perhaps ..
newtype Map3 a b c v = Map3 (Map2 a b (Map c v))
This can obviously be extended for any product type or data
constructor of arbitrary arity. For sum of product types
with N distinct constructors, what's needed is an N-tuple
of product Sets/Maps (one for each constructor), with nullary
constructors just using a simple Bool/Maybe.
So, assuming this scheme can be used to implement Sets/Maps for
any product or algebraic data type, the only real implementation
needed is something like IntSet/IntMap (which could be used for
CharSet/CharMap etc too). We'd probably want a hand generated
implementation for Lists using "proper" tries, but other than
these cases couldn't all this stuff just be derived by the
compiler? (would be nice if it could).
The only problem I see here is the structural equality problem
we've discussed at some length recently. The above scheme assumes
structural equality. If this isn't what's wanted then I guess
we'd have to rely on hand written instances (as is the case with
This also highlights another reason why IMO specifying "biasing"
is bad, because doing this implicitly assumes that Sets/Maps somehow
contain concrete structural representations of elements/keys.
This is not necessarily so. It's not true with any implementation
like that described above (or even simple tries).
Hmm.., just thought of something else. Although this scheme does
not require types to be instances of Ord, it would be nice if
derived or hand written instances of Set/Map were consistent with
any Ord instances, so we could have meaningful ordering with
functions like elemsAscending, foldAscending etc..
elemsAscending :: (Set s e, Ord e) => s e -> [e]
foldAscending :: (Set s e, Ord e) => (e -> a -> a) -> a -> s e -> a
Any comments? Or maybe suggestions about how to use exotic ghc
extensions (about which I understand little) to implement this?
(I know we shouldn't be assuming ghc, in an ideal world :-)
Now of course having written all this I decided this must surely
have been done already by someone :-) Googling for "generalised
tries" quickly took me to this paper by Ralf Hinze..
which in turn references Chris Okasakis book
So why don't we implement something like this? (somehow:-)
More information about the Libraries