Mixed boxed/unboxed arrays?
Viktor Dukhovni
ietf-dane at dukhovni.org
Tue Aug 2 15:06:25 UTC 2022
On Tue, Aug 02, 2022 at 03:31:57PM +0200, J. Reinders wrote:
> I’ve been investigating fast hash table implementations. In particular
> hash tables used for counting unique items. For this use case, I
> believe the most performant hash tables are, in C terms, arrays of
> structures with a (boxed) pointer to the key, which is the item that
> we are counting, and an (unboxed) integer which holds the actual
> count.
>
> I already know of the ‘vector-hashtables’ package which uses two
> separate arrays, for example one boxed to hold the keys and one
> unboxed to hold the counts. However, I believe it can be quite
> important to store all the elements in the same array as that can
> reduce the number of cache misses. Because with random access to two
> arrays there is a higher chance that there will be two cache misses
> even if it immediately finds the right key in the hash table.
Could you use `StablePtr` for the keys?
https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/GHC-Stable.html
The corresponding `Ptr` can be stored in an unboxed Storable array along
with the count.
This comes at the cost of later having to explicitly free each StablePtr.
https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/GHC-Stable.html#v:freeStablePtr
How does the cost of computing object hashes and comparing colliding
objects compare with the potential cache miss cost of using boxed
integers or a separate array? Would such an "optimisation" be worth
the effort?
--
Viktor.
More information about the ghc-devs
mailing list