Profiling and Data.HashTable
Simon Marlow
simonmar at microsoft.com
Mon Oct 17 05:55:22 EDT 2005
On 14 October 2005 20:31, Jan-Willem Maessen wrote:
> That "5K" number made me immediately suspicious, so I took a look at
> the source code to Data.HashTable. Sure enough, it's allocating a
> number of large IOArrays, which are filled with pointers. The
> practical upshot is that, for a hash table with (say) 24 entries, the
> GC must scan an additional 1000 pointers and discover that each one
> is [].
>
> I've seen other implementations of this two-level technique which use
> a smallish sEGMENT_SIZE in order to avoid excessive GC overhead for
> less-than-gigantic hash tables. This might be worth doing in the
> Data.HashTable implementation.
>
> [Curious: what (if anything) is being used to test Data.HashTable?
> I'd be willing to undertake very small amounts of fiddling if I could
> be sure I wasn't slowing down something which mattered.]
The few benchmarks I've tried seem to indicate that Data.HashTable
performance isn't overwhelming, and can often be bettered by Data.Map.
I've never investigated too deeply. GC overhead seems a very plausible
cause, though.
Please by all means tweak the implementation and try some benchmarks,
I'll incorporate any performance fixes that arise.
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list