Data.HashTable
Jan-Willem Maessen
jmaessen at alum.mit.edu
Thu Mar 6 15:07:14 EST 2008
On Mar 6, 2008, at 2:15 PM, Don Stewart wrote:
> jmaessen:
>> On Mar 6, 2008, at 9:28 AM, Johannes Waldmann wrote:
>>
>>> Hello.
>>>
>>> When I insert 50 values in an empty hashtable,
>>> I get longest chains of length 10. Like this:
>>>
>>> import Data.HashTable ( HashTable )
>>> import qualified Data.HashTable as H
>>> import System.Random
>>>
>>> main = do
>>> t <- H.new ( == ) H.hashInt
>>> sequence $ replicate 50 $ do
>>> x <- randomRIO ( 0 , 1000 )
>>> H.update t x ()
>>> lc <- H.longestChain t
>>> print lc
>>>
>>> output:
>>> [(831,()),(346,()),(773,()),(475,()),(812,()),(307,()),(113,()),
>>> (637,
>>> ()),(100,())]
>>
>> The Data.HashTable code was tuned to favor relatively high bucket
>> loads. Is that a good idea (I ask, having done some of this tuning
>> myself)? It's been a long time since the code was re-tuned, and it's
>> likely some of the decisions ought to change in light of changes in
>> GHC's compiler, GC, and so forth. At the moment:
>
> After modifying it for the shootout, I noticed its missing a lot of
> inline stuff (i.e. tends not to get inlined or specialised).
Agreed. I'd be curious to see which is more important in practice:
picking the right bucket and loading parameters or doing better
specialization. Because the hash function is wrapped by the table
itself (as opposed to eg. using a Hashable class---I think HBC did
this) it's a bit harder to get meaningful specialization: the table
itself acts as the dictionary for the hash function, which is the
thing one wants inlined. A bit of hand-rolled worker/wrapper might do
wonders, though.
-Jan
>
>
> Needs a big overhaul.
More information about the Libraries
mailing list