Data.HashTable

Jan-Willem Maessen jmaessen at alum.mit.edu
Thu Mar 6 11:12:46 EST 2008


On Mar 6, 2008, at 9:28 AM, Johannes Waldmann wrote:

> Hello.
>
> When I insert 50 values in an empty hashtable,
> I get longest chains of length 10. Like this:
>
> import Data.HashTable ( HashTable )
> import qualified Data.HashTable as H
> import System.Random
>
> main = do
>    t <- H.new ( == ) H.hashInt
>    sequence $ replicate 50 $ do
>        x <- randomRIO ( 0 , 1000 )
>        H.update t x ()
>    lc <- H.longestChain t
>    print lc
>
> output:
> [(831,()),(346,()),(773,()),(475,()),(812,()),(307,()),(113,()),(637, 
> ()),(100,())]

The Data.HashTable code was tuned to favor relatively high bucket  
loads.  Is that a good idea (I ask, having done some of this tuning  
myself)?  It's been a long time since the code was re-tuned, and it's  
likely some of the decisions ought to change in light of changes in  
GHC's compiler, GC, and so forth.  At the moment:
--  
-----------------------------------------------------------------------------
-- Parameters

tABLE_MAX :: Int32
tABLE_MAX  = 32 * 1024 * 1024   -- Maximum size of hash table
tABLE_MIN :: Int32
tABLE_MIN  = 8

hLOAD :: Int32
hLOAD = 7                       -- Maximum average load of a single  
hash bucket

hYSTERESIS :: Int32
hYSTERESIS = 64                 -- entries to ignore in load computation

{- Hysteresis favors long association-list-like behavior for small  
tables. -}


It's likely modern versions of GHC would fare better with smaller  
settings of tABLE_MIN, hLOAD, and hYSTERESIS.  I seem to recall that  
doubling the table used to be rather hard on the GC.

-Jan

>
>
> This is not what I expected (quite the contrary).
> Did I miss something here? The access time is linear in
> the length of the longest chain?
> Perhaps the initial table (produced by "new") is too small,
> so the table is always overloaded?
>
>
> Looking at
> http://java.sun.com/javase/6/docs/api/java/util/HashMap.html
> they start out with table size 16 by default
> *and* (more importantly) they have a constructor that allows
> to set the initial table size and the load factor.
>
>
> Best regards, Johannes Waldmann.
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries



More information about the Libraries mailing list