AVL Trees available

Thu May 20 12:45:20 EDT 2004

On 20 May 2004 07:50, Adrian Hey wrote:

> On Wednesday 19 May 2004 10:47 am, Simon Marlow wrote:
> 
>> I'm happy for this to be "the" Data.Trees.AVL library.  I'm sure
>> others will want to comment on the details.  I wonder whether it
>> would make sense to have a specialised AVLMap type, which would let
>> you optimise the representation of AVL-based finite maps?
> 
> Do you mean a newtype wrapper for the current AVL type or distinct
> algebraic type which contains both the key and the associated value
> as separate fields (like the Map type folks were talking about
> recently)?
> 
> If you mean the former, one thing I was thinking about doing was
> creating an AVL based clone of the current Data.FiniteMap library so
> folk could experiment with the effect of using both on real programs
> just by changing their imports.

Actually I meant the latter, but this is a good idea too.

Data.Map.AVL?

> If you mean the latter, I'm not keen on this because it would mean
> duplication of a lot of code for the new type, and I'm not sure that
> it would be an optimisation at all. I considered this but decided
> against it on the grounds that I suspect a major contributor to
> the true cost of tree operations is heap burn rate. The AVL tree
> type has only 3 fields in each constructor (not 5 as with the Map
> type for example), and the code was carefully written to avoid
> unnecessary heap construction wherever possible. Doing this would
> result in 4 fields in each constructor.

The total space overhead for the tree would be smaller, because the pair
constructor is inlined.  5 words per tree node instead of 7, and one
fewer eval each time you examine a node.  However if a lot of
rebalancing happens on the tree, then being able to share the key/value
pairs might be a win.

Code duplication is clearly a problem, though.

Anyway, t'was just a thought...

>>> I'd also like to book "Data.COrdering" too, if that's OK.
>> 
>> Data.COrdering is less obviously the right thing.  Can you give some
>> motivation for the design here?
> 
> Because you you generally do end up combining values which turn out
> to be "equal", in some manner. If you don't use a function which
> returns a COrdering, then you have to either..
> 
>  Pass the comparison and combining function separately down your
>  recursive search or whatever. But this will have extra runtime
>  overhead, which is often unnecessary since the only time the
>  combining function is ever used is when the comparison returns EQ.
> 
> or..
> 
>  Use some default combining function (like pairing) and apply the
>  real combining function later. But pairing is the only reasonable
>  default combining function I can think of, and this not always
>  the right type (e.g. for pushing elements in a tree or set union),
>  and even when it's OK type wise you've still got the overhead of an
>  additional map operation (e.g for set intersection). You could get
>  around the type problem using lists I suppose, and use (++) as the
>  default combining function, but this doesn't seem very attractive
>  to me.

Right, I think I see the reasoning.

In the case of writing/pushing, the COrdering allows you to pass a
composition of a comparison and a combining function around, which looks
cleaner.  It's not obviously more efficient than passing the two
functions around, because the actual comparisons will be going through
an extra level of higher-order function (except that you've INLINEd
everything in sight, so the COrdering combinators probably disappear
:-).

In the case of lookup, it doesn't look like you need COrdering: just
apply the "combining function" to the result of genRead/genTryRead,
right?  But given that you're using COrdering for writes, using it for
read too is reasonable.

Ok, I guess I'm more or less convinced.

Cheers,
	Simon