[Haskell-cafe] Boxed Mutable Arrays

Edward Kmett ekmett at gmail.com
Wed Dec 16 14:58:02 EST 2009


On Wed, Dec 16, 2009 at 2:42 PM, Richard Kelsall
<r.kelsall at millstream.com>wrote:

> Brad Larsen wrote:
>
>> I have considered using Data.IntMap to implement a sort of faux hash
>> table, e.g., introduce a Hashable class, and then use an IntMap to
>> lists of Hashable.  The result would be a pure, persistent ``hash
>> table''.  In such a case, however, lookup/insert/delete operations
>> would have average complexity logarithmic in the number of buckets.
>>
>>
> I'll throw in an idea that has been nagging me about hash tables.
> You could have a list of hash tables of increasing size. On an insert
> collision in a smaller table all the entries get put into the next one
> up. For a lookup you would have to check all the tables until you find
> the hash.
>
> e.g.
>
> With three hash table in the list of these example sizes.
> Table size 2^2 = 2 probable entries before collision.
>           2^4 = 4
>           2^6 = 8
>
> h..h
> ...h.....h.h...h
> h......h.......h....h...........h......h.h..........h...........
>
>
> I expect if it's any good it has already been done.
>
>
Your asymptotics will take a hit if the height of the tower is logarithmic
in the array size, and you'll take a constant hit for bounded-height towers.


Consider that if you have density such that the biggest array has an
expected population, then your smaller arrays will be relatively drastically
over populated, so you'll bump up to the next largest array in more cases,
paying extra seeks against disproportionally heavily loaded buckets, whereas
the time could have been more gainfully employed seeking against the more
uniformly distributed larger bucket.

If you have a decent hash function, you won't have lumps in your
distribution, so having lumps in your bucket densities is purely a
pessimization. Especially when you've ensured that those buckets that are
going to be overpopulated are exactly the ones you are checking first.

You might consider looking at Witold Litwin's Sorted Linear Hash Table. I
have a hackish implementation on hackage somewhere, but bear in mind it was
the first piece of Haskell I'd ever written and the STM version I have
bottlenecks on the root of the tree, so would be better served by multiple
root TVars.

I can't find it on Hackage, but here it is:

http://comonad.com/haskell/thash/

-Edward Kmett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20091216/a8377159/attachment.html


More information about the Haskell-Cafe mailing list