Profiling and Data.HashTable

Mon Oct 17 05:55:22 EDT 2005

On 14 October 2005 20:31, Jan-Willem Maessen wrote:

> That "5K" number made me immediately suspicious, so I took a look at
> the source code to Data.HashTable.  Sure enough, it's allocating a
> number of large IOArrays, which are filled with pointers.  The
> practical upshot is that, for a hash table with (say) 24 entries, the
> GC must scan an additional 1000 pointers and discover that each one
> is [].
> 
> I've seen other implementations of this two-level technique which use
> a smallish sEGMENT_SIZE in order to avoid excessive GC overhead for
> less-than-gigantic hash tables.  This might be worth doing in the
> Data.HashTable implementation.
>
> [Curious: what (if anything) is being used to test Data.HashTable?
> I'd be willing to undertake very small amounts of fiddling if I could
> be sure I wasn't slowing down something which mattered.]

The few benchmarks I've tried seem to indicate that Data.HashTable
performance isn't overwhelming, and can often be bettered by Data.Map.
I've never investigated too deeply.  GC overhead seems a very plausible
cause, though.

Please by all means tweak the implementation and try some benchmarks,
I'll incorporate any performance fixes that arise.

Cheers,
	Simon