Mixed boxed/unboxed arrays?

Viktor Dukhovni ietf-dane at dukhovni.org
Tue Aug 2 15:06:25 UTC 2022


On Tue, Aug 02, 2022 at 03:31:57PM +0200, J. Reinders wrote:

> I’ve been investigating fast hash table implementations. In particular
> hash tables used for counting unique items. For this use case, I
> believe the most performant hash tables are, in C terms, arrays of
> structures with a (boxed) pointer to the key, which is the item that
> we are counting, and an (unboxed) integer which holds the actual
> count.
> 
> I already know of the ‘vector-hashtables’ package which uses two
> separate arrays, for example one boxed to hold the keys and one
> unboxed to hold the counts. However, I believe it can be quite
> important to store all the elements in the same array as that can
> reduce the number of cache misses. Because with random access to two
> arrays there is a higher chance that there will be two cache misses
> even if it immediately finds the right key in the hash table.

Could you use `StablePtr` for the keys?

    https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/GHC-Stable.html

The corresponding `Ptr` can be stored in an unboxed Storable array along
with the count.

This comes at the cost of later having to explicitly free each StablePtr.

    https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/GHC-Stable.html#v:freeStablePtr

How does the cost of computing object hashes and comparing colliding
objects compare with the potential cache miss cost of using boxed
integers or a separate array?  Would such an "optimisation" be worth
the effort?

-- 
    Viktor.


More information about the ghc-devs mailing list