[Haskell-cafe] Re: Hashtable woes
Chris Kuklewicz
haskell at list.mightyreason.com
Tue Jan 24 06:44:11 EST 2006
Bulat Ziganshin wrote:
> Hello Chris,
>
> Monday, January 23, 2006, 6:09:15 PM, you wrote:
>
> CK> Using -A400m I get 39s down from 55s. That is the best Data.HashTable time I
> CK> have seen. (Using -A10m and -A100m were a little slower).
>
> 1) "-A400m" is a bit unusual. "-H400m" for 500-meg machine, "-H800m"
> for 1g computer and so on will be fastest. current GHC doc leaks
> explanations in this area, but basically -H just allocates that much
> area and then dynamically changes -A after each GC allocating all
> available space to the generation-0 memory pool
>
> 2) it's better to say that was MUT and GC times in your program, and
> even better just to post its output with "+RTS -sstderr"
>
> please post improved results here. that's really interesting for me,
> and for my programs too ;)
>
Here is all the data you wanted:
Running "./cekp +RTS -sstderr -RTS < kfile"
6,292,511,740 bytes allocated in the heap
1,641,755,092 bytes copied during GC
38,233,484 bytes maximum residency (215 sample(s))
24003 collections in generation 0 ( 27.23s)
215 collections in generation 1 ( 7.80s)
82 Mb total memory in use
INIT time 0.00s ( 0.01s elapsed)
MUT time 71.32s ( 78.99s elapsed)
GC time 35.03s ( 39.19s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 106.35s (118.19s elapsed)
%GC time 32.9% (33.2% elapsed)
Alloc rate 88,229,272 bytes per MUT second
Productivity 67.1% of total user, 60.3% of total elapsed
Is that 6 Billion bytes allocated? Yes it is. So use -H 400m:
" ./cekp +RTS -H400m -sstderr -RTS < kfile "
6,293,156,400 bytes allocated in the heap
99,679,428 bytes copied during GC
8,742,464 bytes maximum residency (2 sample(s))
18 collections in generation 0 ( 2.84s)
2 collections in generation 1 ( 0.38s)
392 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed)
MUT time 82.42s (100.62s elapsed)
GC time 3.22s ( 3.93s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 85.65s (104.55s elapsed)
%GC time 3.8% (3.8% elapsed)
Alloc rate 76,345,461 bytes per MUT second
Productivity 96.2% of total user, 78.8% of total elapsed
So this is the small improvement, at the cost of turning off the GC. The
Data.HashTable performance is the bottleneck, but it is not the GC's fault.
Using the way over optimized hashtable in c-code that I adapted:
" ./useit-tree +RTS -s -RTS < ../kfile "
1,096,743,184 bytes allocated in the heap
190,832,852 bytes copied during GC
38,233,484 bytes maximum residency (9 sample(s))
4183 collections in generation 0 ( 1.39s)
9 collections in generation 1 ( 0.88s)
82 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed)
MUT time 14.55s ( 16.06s elapsed)
GC time 2.27s ( 2.87s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 16.82s ( 18.93s elapsed)
%GC time 13.5% (15.2% elapsed)
Alloc rate 75,377,538 bytes per MUT second
Productivity 86.5% of total user, 76.9% of total elapsed
Here I see only 1 billion bytes allocated, but I think that hides 100 to 200
million that are being allocated in the c-code.
Note that the total time has dropped by a factor of five over the hashtable w/o
GC, a factor of 6 if I had let GC run. I can reduce the total time from 16.82s
to 16.20s by adding -H here but that is not called for. ( The time without
profiling is 12.5s)
This benchmark stresses hash tables by adding 1,250,000 string keys, with the
values being the count of the times the key was added. This is the common "make
a histogram" usage, and never needs to delete from the hashtable. ( The keys are
all fixed length, and the number is known. )
Looking at it as a black box, Data.HashTable is doing a tremendous amount of MUT
work that dominates the run-time, taking many times longer than the Hashtbl that
OCaml uses. And D's associative array is scary fast.
I have not yet tested the proposed Data.Hash for GHC 6.6. Apparently I need to
get a darcs checkout to avoid a bug in GHC 6.4.1 first.
I don't expect a Haskell data structure to be as fast as the over optimized
c-code. But I am very surprised that there is no mutable data structure that
can compete with OCaml.
--
Chris
More information about the Haskell-Cafe
mailing list