Data.HashTable

Thu Mar 6 15:07:14 EST 2008

On Mar 6, 2008, at 2:15 PM, Don Stewart wrote:

> jmaessen:
>> On Mar 6, 2008, at 9:28 AM, Johannes Waldmann wrote:
>>
>>> Hello.
>>>
>>> When I insert 50 values in an empty hashtable,
>>> I get longest chains of length 10. Like this:
>>>
>>> import Data.HashTable ( HashTable )
>>> import qualified Data.HashTable as H
>>> import System.Random
>>>
>>> main = do
>>>  t <- H.new ( == ) H.hashInt
>>>  sequence $ replicate 50 $ do
>>>      x <- randomRIO ( 0 , 1000 )
>>>      H.update t x ()
>>>  lc <- H.longestChain t
>>>  print lc
>>>
>>> output:
>>> [(831,()),(346,()),(773,()),(475,()),(812,()),(307,()),(113,()), 
>>> (637,
>>> ()),(100,())]
>>
>> The Data.HashTable code was tuned to favor relatively high bucket
>> loads.  Is that a good idea (I ask, having done some of this tuning
>> myself)?  It's been a long time since the code was re-tuned, and it's
>> likely some of the decisions ought to change in light of changes in
>> GHC's compiler, GC, and so forth.  At the moment:
>
> After modifying it for the shootout, I noticed its missing a lot of
> inline stuff (i.e. tends not to get inlined or specialised).

Agreed.  I'd be curious to see which is more important in practice:  
picking the right bucket and loading parameters or doing better  
specialization.  Because the hash function is wrapped by the table  
itself (as opposed to eg. using a Hashable class---I think HBC did  
this) it's a bit harder to get meaningful specialization: the table  
itself acts as the dictionary for the hash function, which is the  
thing one wants inlined.  A bit of hand-rolled worker/wrapper might do  
wonders, though.

-Jan

>
>
> Needs a big overhaul.