Proposal: add laziness to Data.Map / IntMap

Scott Dillard sedillard at ucdavis.edu
Tue Aug 5 10:56:54 EDT 2008


Hi Bertram

Scott Dillard wrote:
> > I think maybe you guys (Don and Andrew) are misunderstanding my proposal.
> > The lazy/strict tradeoff is a subtle issue, and I'll be sure to re-read
> > Okasaki's stuff with this in mind, but what I'm talking about here is not
> a
> > trade off. It's laziness "for free". Move the strictness annotations out
> of
> > the constructors and into the library functions using 'seq'. Laziness is
> > exposed through _separate_ functions.
>
> It's not for free. When the compiler does a pattern match on the Bin
> constructor, Bin sz kx x l r, it can no longer assume that l and r are
> fully evaluated, so it has to add code to evaluate them in case they are
> not. And in fact, this code will be needed if any of your proposed lazy
> functions are added later. I have not checked whether this has a
> measurable performance or code size impact.
>
> > So mapWithKey retains all semantics, including guarantees.
>
> Semantically the change is safe, agreed.


You're right of course (hadn't thought about that) but I'd like to point out
that Adrien Hey's AVL tree library is structured in the way I propose: A
node's subtree fields are not marked strict, instead seq is used to force
construction of the tree. I think it's generally faster than Data.Map, at
least it's claimed to be so in the documentation. The trees are not built
with the same algorithm, but if the use of seq instead of bangs does impose
a significant overhead, then that speaks very highly to the AVL algorithm.

This of course raises the question of why I don't just use that library. I'm
checking it out, but it's quite a bit bigger than Data.Map. The thing I like
about Data.Map is that it's relatively simple and I can see whats going on
very quickly, enough to make modifications (which I'm about to do.) Reading
Data.Tree.AVL is a more daunting task.

Also, there's something to be said for "lazy by default." I don't see much
difference between Data.List.map and Data.Map.map (for finite lists anyway)
and so if laziness is the correct default for the former then it should also
be for the latter. The current Data.Map.map is only half strict: it builds
an entire tree and fills it with thunks. Better I think to either build the
whole thing completely or build only what's needed. The current Data.Map.map
is the worst of both, no laziness + heap burn.

So I guess I'll start writing. Anyone have any good benchmarks? Which naming
scheme is less ugly, Data.Map.mapWithKeyL or Data.Map.Lazy.mapWithKey? Any
other suggestions would be appreciated.

Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/libraries/attachments/20080805/5e662d3f/attachment.htm


More information about the Libraries mailing list