Profiling and Data.HashTable
Jan-Willem Maessen
jmaessen at alum.mit.edu
Fri Oct 14 15:30:44 EDT 2005
On Oct 14, 2005, at 10:17 AM, Ketil Malde wrote:
> Hi all,
>
> I have a program that uses hash tables to store word counts. It can
> use few, large hash tables, or many small ones. The problem is that
> it uses an inordinate amount of time in the latter case, and
> profiling/-sstderr shows it is GC that is causing it (accounting for
> up to 99% of the time(!))
>
> Is there any reason to expect this behavior?
>
> Heap profiling shows that each hash table seems to incur a memory
> overhead of approx 5K, but apart from that, I'm not able to find any
> leaks or unexpected space consumption.
>
>
That "5K" number made me immediately suspicious, so I took a look at
the source code to Data.HashTable. Sure enough, it's allocating a
number of large IOArrays, which are filled with pointers. The
practical upshot is that, for a hash table with (say) 24 entries, the
GC must scan an additional 1000 pointers and discover that each one
is [].
I've seen other implementations of this two-level technique which use
a smallish sEGMENT_SIZE in order to avoid excessive GC overhead for
less-than-gigantic hash tables. This might be worth doing in the
Data.HashTable implementation.
[Curious: what (if anything) is being used to test Data.HashTable?
I'd be willing to undertake very small amounts of fiddling if I could
be sure I wasn't slowing down something which mattered.]
-Jan-Willem Maessen
>
> Suggestions?
>
> -k
> --
> If I haven't seen further, it is by standing in the footprints of
> giants
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
>
>
More information about the Glasgow-haskell-users
mailing list