[Haskell-cafe] Re: Hashtable woes

Chris Kuklewicz haskell at list.mightyreason.com
Tue Jan 24 06:44:11 EST 2006


Bulat Ziganshin wrote:
> Hello Chris,
>
> Monday, January 23, 2006, 6:09:15 PM, you wrote:
>
> CK> Using -A400m I get 39s down from 55s.  That is the best Data.HashTable time I
> CK> have seen. (Using -A10m and -A100m were a little slower).
>
> 1) "-A400m" is a bit unusual. "-H400m" for 500-meg machine, "-H800m"
> for 1g computer and so on will be fastest. current GHC doc leaks
> explanations in this area, but basically -H just allocates that much
> area and then dynamically changes -A after each GC allocating all
> available space to the generation-0 memory pool
>
> 2) it's better to say that was MUT and GC times in your program, and
> even better just to post its output with "+RTS -sstderr"
>
> please post improved results here. that's really interesting for me,
> and for my programs too ;)
>

Here is all the data you wanted:

Running "./cekp +RTS -sstderr -RTS < kfile"

6,292,511,740 bytes allocated in the heap
1,641,755,092 bytes copied during GC
 38,233,484 bytes maximum residency (215 sample(s))

      24003 collections in generation 0 ( 27.23s)
        215 collections in generation 1 (  7.80s)

         82 Mb total memory in use

  INIT  time    0.00s  (  0.01s elapsed)
  MUT   time   71.32s  ( 78.99s elapsed)
  GC    time   35.03s  ( 39.19s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time  106.35s  (118.19s elapsed)

  %GC time      32.9%  (33.2% elapsed)

  Alloc rate    88,229,272 bytes per MUT second

  Productivity  67.1% of total user, 60.3% of total elapsed

Is that 6 Billion bytes allocated?  Yes it is.  So use -H 400m:
" ./cekp +RTS -H400m -sstderr -RTS < kfile "

6,293,156,400 bytes allocated in the heap
 99,679,428 bytes copied during GC
  8,742,464 bytes maximum residency (2 sample(s))

         18 collections in generation 0 (  2.84s)
          2 collections in generation 1 (  0.38s)

        392 Mb total memory in use

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   82.42s  (100.62s elapsed)
  GC    time    3.22s  (  3.93s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   85.65s  (104.55s elapsed)

  %GC time       3.8%  (3.8% elapsed)

  Alloc rate    76,345,461 bytes per MUT second

  Productivity  96.2% of total user, 78.8% of total elapsed

So this is the small improvement, at the cost of turning off the GC.  The
Data.HashTable performance is the bottleneck, but it is not the GC's fault.

Using the way over optimized hashtable in c-code that I adapted:
" ./useit-tree +RTS -s -RTS < ../kfile "

1,096,743,184 bytes allocated in the heap
190,832,852 bytes copied during GC
 38,233,484 bytes maximum residency (9 sample(s))

       4183 collections in generation 0 (  1.39s)
          9 collections in generation 1 (  0.88s)

         82 Mb total memory in use

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   14.55s  ( 16.06s elapsed)
  GC    time    2.27s  (  2.87s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   16.82s  ( 18.93s elapsed)

  %GC time      13.5%  (15.2% elapsed)

  Alloc rate    75,377,538 bytes per MUT second

  Productivity  86.5% of total user, 76.9% of total elapsed

Here I see only 1 billion bytes allocated, but I think that hides 100 to 200
million that are being allocated in the c-code.

Note that the total time has dropped by a factor of five over the hashtable w/o
GC, a factor of 6 if I had let GC run.  I can reduce the total time from 16.82s
to 16.20s by adding -H here but that is not called for. ( The time without
profiling is 12.5s)

This benchmark stresses hash tables by adding 1,250,000 string keys, with the
values being the count of the times the key was added.  This is the common "make
a histogram" usage, and never needs to delete from the hashtable. ( The keys are
all fixed length, and the number is known. )

Looking at it as a black box, Data.HashTable is doing a tremendous amount of MUT
work that dominates the run-time, taking many times longer than the Hashtbl that
OCaml uses.  And D's associative array is scary fast.

I have not yet tested the proposed Data.Hash for GHC 6.6.  Apparently I need to
get a darcs checkout to avoid a bug in GHC 6.4.1 first.

I don't expect a Haskell data structure to be as fast as the over optimized
c-code.  But I am very surprised that there is no mutable data structure that
can compete with OCaml.

-- 
Chris


More information about the Haskell-Cafe mailing list