enummapset merge into containers

Jan-Willem Maessen jmaessen at alum.mit.edu
Tue Feb 21 22:34:44 CET 2012


On Tue, Feb 21, 2012 at 3:58 PM, Ben Gamari <bgamari.foss at gmail.com> wrote:

> On Tue, 21 Feb 2012 11:25:05 -0800, Johan Tibell <johan.tibell at gmail.com>
> wrote:
> > On Tue, Feb 21, 2012 at 11:20 AM, Ben Gamari <bgamari.foss at gmail.com>
> wrote:
> > > This is certainly an option. It's actually been suggested that I use
> > > HashMaps instead of IntMaps to get more uniform use of the
> > > key-space. I have been a little worried that the cost of hashing would
> > > outweigh the benefit of using benefits of more shallow trees, but I
> > > suppose memory accesses are expensive. That being said, I could
> > > certainly just use the Enum instance.
> >
> > In the case of Ints and newtypes thereof hashing is very cheap. A
> > no-op or a multiplication with a large prime.
> >
> Sure. In my application (machine learning) we were mapping from tokens
> (short strings, generally less than 15 bytes) to sequential newtype'd
> Ints (newtype Token = Token Int). However, the fact that these
> identifiers were sequential meant that IntMap's trees would be close to
> the worst-case log(N) depth.


Either way, the tries only contain levels on which keys differ, so tree
depths shouldn't vary by more than 1 or so.

HashMap uses lsb-first splitting in its current incarnation, so that you
end up balancing contiguous keys across the trie.  This is the main
argument in favor of low-order-first splitting.  The main argument against
is that high-order-first splitting yields a standard binary search tree for
lookup operations if you encode the pivots appropriately.

I don't recall which method is used by Johan's wide-fanout tries.  I seem
to recall that at wider fanouts you're happier (for GC reasons) churning
the right spine in response to contiguous insertions, rather than churning
the entire tree.

The takeaway: if you're seeing performance problems from contiguous
insertion, it might be worth comparing with HashMap.

-Jan-Willem Maessen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20120221/e8a3f5d8/attachment.htm>


More information about the Libraries mailing list