Proposal: add laziness to Data.Map / IntMap

Mon Aug 4 20:50:14 EDT 2008

G'day all.

Quoting Scott Dillard <sedillard at ucdavis.edu>:

> I found myself wanting a lazy version of Data.Map today, and by "lazy" I
> mean in the node-subtree pointers.

Right.  Just to be clear, to start off with:

   - It makes no sense for "keys" to be lazy, because they need to be
     inspected to determine the shape of the data structure.  (Except
     in the case of a singleton map, if you know where in the tree
     some (key,value) pair goes, then you've already evaluated the key.)

   - It's better for "values" to be lazy in a general-purpose map-type
     data structure, because making them strict breaks Functor.

So the remaining interesting question is the internal structure pointers.

> I trust the library writers when they put
> strictness annotations in the fields of the tree nodes, so I'm wondering
> what the various implications of lazy subtrees are, beyond the obvious speed
> hit. Does this cause space leaks beyond what lazy value pointers can already
> cause? Can someone point me to some reading that discusses this?

Yes, please read Chris Okasaki's "Purely Functional Data Structures"
for a fuller discussion of the tradeoffs of putting laziness in
different places in a data structure.

Making internal pointers strict vs making them lazy doesn't necessarily
buy you much in the way of raw-cycle-counting performance.  What it buys
you is a complexity _guarantee_, which in Haskell is often more valuable.

Thinking of a binary search tree for a moment, making the internal
pointers lazy means that insertion is always O(1), but lookup may take
an arbitrary amount of time (though it will be O(log n) amortised).  It
also adds a raw-cycle-counting cost to every lookup, even if the tree is
fully evaluated.

This is the opposite of the usual desired performance.  Dictionary
implementations tend to assume that lookups are more common than
insertions and deletions, and correspondingly, clients tend to assume
that insertions and deletions are more expensive than lookups.

If these assumptions don't match your code, then indeed, you may be
using the wrong data struture.

> Looking at the first version, clearly you see that when constructing a new
> map you should only have to pay for the sub trees that you actually use. If
> you perform only a handful of lookups then throw the new tree away, why
> build the whole thing?

If you only perform a handful of lookups, I question whether you actually
wanted a binary search tree to begin with.  Wouldn't an association list
have done the job just as well?  Or compiling to functions?

> To further motivate, let me explain my use case. [...]
> So I memoize the thing into something like
> "data Node a = Node a [Node a]"

Right.  Memoising CAFs is an excellent example of one of those very few
places where writing your own dictionary data type can make a lot of
sense.

Why?  Because there are a lot of things that you don't need from, say,
AVL trees.  For example, you know all the keys in advance, which means
that your memo table won't need to be modified once it's built.  You
don't even need insertions, let alone deletions or a Functor instance.

Have you tried just using functions?  Something like this:

-- WARNING: Untested code follows.  Use at own risk.

type MyMap k v = k -> v    -- k -> Maybe v may also be appropriate.

myMapFromSortedAssocList :: (Ord k) => [(k,v)] -> MyMap k v
myMapFromSortedAssocList kvs
     = buildMap (length kvs) kvs
     where
         errorCase = error "Can't find key"

         -- Feel free to optimise additional base cases if desired.
         buildMap _ [] = \key -> errorCase
         buildMap _ [(k,v)] = \key -> if k == key then v else errorCase
         buildMap l kvs
           = let l2 = l `div` 2
                 (kvsl,(k,v):kvs2) = splitAt l2 kvs
                 mapl = buildMap l2 kvs1
                 mapr = buildMap (l - l2 - 1) kvs2
             in \key -> case compare key k of
                           LT -> mapl key
                           GT -> mapr key
                           EQ -> v

(Exercise for intermediate-level Haskellers: Why is "key" bound by an
explicit lambda?)

Cheers,
Andrew Bromage