Stable pointers and hash table performance
eyal.lotem at gmail.com
Thu Feb 7 03:13:52 CET 2013
I built a hash table in C: https://github.com/Peaker/small_hash
According to a few simple benchmarks (mainly, 10 million insertions), my C
hash table is much faster than any Haskell data structure I tried (e.g: 8
times faster than IntMap).
You can run the C benchmark (in small_hash) via: "make && ./benchmark
The other benchmarks are at: https://github.com/Peaker/hashtable_benchmarks
Benchmarks results on my hardware: http://hpaste.org/81902
I was hoping to get that fast mapping goodness to Haskell, by building an
FFI to it:
In order to do FFI bindings to a C data structure, that can contain
arbitrary Haskell values, I must create a lot of StablePtrs (for each
Haskell value stored in the C hash table).
Disappointingly, the FFI bindings were dozens of times slower(!) than the C
When using +RTS -H200M the speed is about 10 times slower(!) than the C
oprofile shows that the culprit is the Stable Ptr code.
Digging in, I found that stable ptrs and stable names shared a table,
despite having quite different characteristics:
* Stable Names aren't considered roots by GC, ptrs are.
* Making two stable ptrs for the same Haskell value doesn't really require
them to unify -- so there's no need for the hash table or the "refs"
counter for stable ptrs.
I split the Stable Names into their own table (which allowed removing the
"refs" field from them).
Stable Ptrs no longer use a hash table to unify, don't need "refs", or
"sn_obj", or "old".
This made Stable Ptrs cost just 1 C ptr in the table and really cheap to
The commits are here:
They are not polished (I didn't thoroughly review myself yet, didn't run
the test suite, fix out-of-date comments involved) and not ready for
The hash table benchmark via the FFI is now only ~2 times slower than the C
data structure and much faster than any other mapping structure for that
particular benchmark (10 million insertions of identity integers,
sequentially). A lot of the cost here is due to fine-grained allocation, as
the C code does bulk allocation. I shall improve this benchmark later, and
I believe it will tighten the gap some more.
Every minor GC iterates a whole lot of stable ptrs. This means that with
the default heap size, runtime is about 8 times longer (than with a 200M
To avoid this, I'd like to split the stable ptrs table into generations.
To do this, I need to know which of the pointed objects were promoted and
to what generation, so I can promote my stable ptrs accordingly.
It seems like the correct time to do this is in the updateStablePtrTable
(now named "updateStableTables")
This is the part I need help with.
How do I know if an object was promoted to a higher generation when I
traverse the objects?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ghc-devs