[Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

Johan Tibell johan.tibell at gmail.com
Wed Feb 23 23:04:03 CET 2011


On Wed, Feb 23, 2011 at 1:55 PM, Max Bolingbroke
<batterseapower at hotmail.com> wrote:
> Thanks for bringing some data to the table. There are definitely some
> common patterns in what you sent me:
>
> 1) For defining Binary instances, you need to write set size before
> you write the elements: ~7 occurrences
> 2) Tests against small constants (typically <= 0 or 1, but also 2 and
> 3!): ~15 occurrences
> 3) A surprise to me: generating fresh names! People keep a set of all
> names generated so far, and then just take size+1 as their fresh name.
> Nice trick. ~17 occurrences
> 4) Turning sizes into strings for human consumption: ~19 occurrences
> 5) Just reexporting the functions somehow. Uninformative. ~8 occurrences
>
> There were ~38 occurrences over ~13 repos where it appeared to be
> somehow fundamental to an algorithm (I didn't look into most of these
> in detail). I've put those after my message.
>
> Frankly I am surprised how much "size" gets used. It seems that making
> it fast is more important than I thought.

Nice analysis. Does this apply to maps as well as sets or are sets use
differently than maps somehow?

IntMap (which shares data structure with HashMap) only hash O(n) size.
I wonder if people avoid using IntMap because of this.

I wonder if there's a way to implement size that doesn't mess up the
code so badly (see the commit on GitHub to see how badly it messes up
e.g. insert).

Johan



More information about the Haskell-Cafe mailing list