DData benchmark

Fri Mar 12 15:55:35 EST 2004

Simon Marlow wrote:
> > However, I have never tested this with proper benchmarks. 
> > (now that I think
> > about it, I might have tested it with improper  benchmarks... I'll
try
> > to find on my computer when I get home).
> 
> Great, I'd be really interested to see some numbers.

I have found the benchmark code again, but unfortunately
I have never tested against Data.FiniteMap... but I can adapt
the sources and do some testing next week.

A word of warning though, the tests are not necessarily 
honest, precise, and scientifically responsible. However, they
may give a good intuition. I ran some common operations like
"union", "member", and "fromList" on ordered and on random data
and compared their wall-clock performance. All data was explicitly
forced with performGC's in between to make an as honest as possible
comparison. For a *real* test however, one should probably use
specialized tools and specialized data sets.

Here are the unscientific results for some DData structures.

(ST) DData.Set: size balanced tree
(PT) DData.IntSet: big-endian patricia tree

					  ST(msec) PT(msec)  relative
..timing:  fromlist ordered0:    546 -    125 -     4.4
..timing:  fromlist ordered1:    531 -    125 -     4.2
(descending order)	
..timing: fromlist xordered0:     78 -    125 -     0.6           (known
to be ordered)
..timing: fromlist xordered1:    109 -    140 -     0.8           (known
to be in reverse order)
..timing:   fromlist random0:    281 -    156 -     1.8
..timing:   fromlist random1:    281 -    156 -     1.8
..timing:   fromlist random2:    609 -    359 -     1.7
..timing:    union random0  :    578 -    250 -     2.3
..timing:    union random1  :    656 -    296 -     2.2
..timing:    union random2  :    671 -    296 -     2.3
..timing:    union tangled  :    281 -    125 -     2.2
..timing:    union disjoint1:    218 -     93 -     2.3
..timing:    union disjoint2:    296 -    156 -     1.9
..timing:            member0:    546 -    468 -     1.2
..timing:            member1:    484 -    484 -     1.0
..timing:            member2:    484 -    484 -     1.0

Here we can see that an IntSet is always faster than a "Set Int" 
except for known ordered data (= xordered)

Here is an interesting bench mark for an unreleased version of
DData that included "Treaps": these are trees where each node is
associated with a random priority that determines the position of
the node. As these are random, the tree will be balanced without
doing explicit rebalancing. The functional Treap uses clever hash
functions to get semi random priorities. Here is a Treap set compared
to "Set Int".
					 ST(msec) Treap(msec) relative
..timing:  fromlist ordered0:    531 -    234 -     2.3
..timing:  fromlist ordered1:    531 -    203 -     2.6
..timing: fromlist xordered0:     93 -    234 -     0.4
..timing: fromlist xordered1:    109 -    203 -     0.5
..timing:   fromlist random0:    296 -    312 -     0.9
..timing:   fromlist random1:    281 -    312 -     0.9
..timing:   fromlist random2:    609 -    671 -     0.9
..timing:    union random0  :    578 -    593 -     1.0
..timing:    union random1  :    687 -    687 -     1.0
..timing:    union random2  :    687 -    687 -     1.0
..timing:    union tangled  :    265 -    296 -     0.9
..timing:    union disjoint1:    203 -    203 -     1.0
..timing:    union disjoint2:    296 -    296 -     1.0
..timing:            member0:    546 -    468 -     1.2
..timing:            member1:    484 -    484 -     1.0
..timing:            member2:    468 -    484 -     1.0
..timing:             unions:    140 -     78 -     1.8

>From this data, I got the suspicion that every set implementation
based on balanced trees will do about as good as any other.

I hope to do some fresh testing against Data.FiniteMap soon,
as it will be quite important to know about there absolute efficiency.

All the best,
 Daan.