AVL Trees available
simonmar at microsoft.com
Thu May 20 12:45:20 EDT 2004
On 20 May 2004 07:50, Adrian Hey wrote:
> On Wednesday 19 May 2004 10:47 am, Simon Marlow wrote:
>> I'm happy for this to be "the" Data.Trees.AVL library. I'm sure
>> others will want to comment on the details. I wonder whether it
>> would make sense to have a specialised AVLMap type, which would let
>> you optimise the representation of AVL-based finite maps?
> Do you mean a newtype wrapper for the current AVL type or distinct
> algebraic type which contains both the key and the associated value
> as separate fields (like the Map type folks were talking about
> If you mean the former, one thing I was thinking about doing was
> creating an AVL based clone of the current Data.FiniteMap library so
> folk could experiment with the effect of using both on real programs
> just by changing their imports.
Actually I meant the latter, but this is a good idea too.
> If you mean the latter, I'm not keen on this because it would mean
> duplication of a lot of code for the new type, and I'm not sure that
> it would be an optimisation at all. I considered this but decided
> against it on the grounds that I suspect a major contributor to
> the true cost of tree operations is heap burn rate. The AVL tree
> type has only 3 fields in each constructor (not 5 as with the Map
> type for example), and the code was carefully written to avoid
> unnecessary heap construction wherever possible. Doing this would
> result in 4 fields in each constructor.
The total space overhead for the tree would be smaller, because the pair
constructor is inlined. 5 words per tree node instead of 7, and one
fewer eval each time you examine a node. However if a lot of
rebalancing happens on the tree, then being able to share the key/value
pairs might be a win.
Code duplication is clearly a problem, though.
Anyway, t'was just a thought...
>>> I'd also like to book "Data.COrdering" too, if that's OK.
>> Data.COrdering is less obviously the right thing. Can you give some
>> motivation for the design here?
> Because you you generally do end up combining values which turn out
> to be "equal", in some manner. If you don't use a function which
> returns a COrdering, then you have to either..
> Pass the comparison and combining function separately down your
> recursive search or whatever. But this will have extra runtime
> overhead, which is often unnecessary since the only time the
> combining function is ever used is when the comparison returns EQ.
> Use some default combining function (like pairing) and apply the
> real combining function later. But pairing is the only reasonable
> default combining function I can think of, and this not always
> the right type (e.g. for pushing elements in a tree or set union),
> and even when it's OK type wise you've still got the overhead of an
> additional map operation (e.g for set intersection). You could get
> around the type problem using lists I suppose, and use (++) as the
> default combining function, but this doesn't seem very attractive
> to me.
Right, I think I see the reasoning.
In the case of writing/pushing, the COrdering allows you to pass a
composition of a comparison and a combining function around, which looks
cleaner. It's not obviously more efficient than passing the two
functions around, because the actual comparisons will be going through
an extra level of higher-order function (except that you've INLINEd
everything in sight, so the COrdering combinators probably disappear
In the case of lookup, it doesn't look like you need COrdering: just
apply the "combining function" to the result of genRead/genTryRead,
right? But given that you're using COrdering for writes, using it for
read too is reasonable.
Ok, I guess I'm more or less convinced.
More information about the Libraries