[Haskell] RFC: DData in hierarchical libraries

Thu Mar 11 13:44:55 EST 2004

Hi all,

On Mon, 8 Mar 2004 16:58:13 -0000, Simon Marlow <simonmar at microsoft.com> wrote:

> Regarding the Seq type, I like the one that we use in GHC.  See:

The "Seq" and "Queue" modules are less "mature" than
the Map/Set/Bag modules -- it may be good if someone consistently
extends them with the GHC modules as examples.

> How do DData.Map and DData.IntMap compare in performance to the existing
> Data.FiniteMap?  Similarly for Sets?

In terms of worst-case bounds, DData.Map equals Data.FiniteMap.
DData.IntMap has an incomparable worst-case bound than Data.FiniteMap

I *suspect* that the absolute efficiency of DData.Map will be close to
Data.FiniteMap as they use essentially the same algorithms and representation.
On some operations DData.Map will be better as it uses "hedge" variations.
However, I have never tested this with proper benchmarks. (now that I think
about it, I might have tested it with improper benchmarks... I'll try
to find on my computer when I get home).

I have measured different DData.IntMap against DData.Map instantiated
with Int's and it strongly outperformed it on all operations. (I'll give
numbers if I can find them again). This was my motivation for including
a specialized implementation based on (big-endian) patricia trees.

> Can IntMap and IntSet be implemented by specialisation or overloading
> the Set type (ala the overloaded arrays in Data.Array)?  (The interfaces
> look very similar, but I didn't compare them in detail).

IntMap/IntSet have completely different algorithms than the corresponding
Map/Set modules. The interfaces are very similar off course :-)

I think that making them an instance of a common "Set" type should not
be goal of DData but rather a goal of a framework like Edison, for which
DData can provide concrete implementations.

> Why is there no direct translation between Seqs and Sets/Maps, but there
> is for lists?  I guess there's a consistency issue here - do we want to
> (a) use lists exclusively, (b) use Seq exclusively, or (c) have all
> operations available for both types?

I personally prefer to use lists exclusively, and to make the "seqFromList"
function really efficient, in the sense that we guarantee that
"listFromSeq . seqFromList" takes constant time.

All the best,
  Daan.