Mixed boxed/unboxed arrays?

J. Reinders jaro.reinders at gmail.com
Tue Aug 2 13:31:57 UTC 2022


Hi GHC devs,

I’ve been investigating fast hash table implementations. In particular hash tables used for counting unique items. For this use case, I believe the most performant hash tables are, in C terms, arrays of structures with a (boxed) pointer to the key, which is the item that we are counting, and an (unboxed) integer which holds the actual count.

I already know of the ‘vector-hashtables’ package which uses two separate arrays, for example one boxed to hold the keys and one unboxed to hold the counts. However, I believe it can be quite important to store all the elements in the same array as that can reduce the number of cache misses. Because with random access to two arrays there is a higher chance that there will be two cache misses even if it immediately finds the right key in the hash table.

So, I have also been looking at the low level arrays from the ‘primitive’ package and even in GHC.Exts, but I don’t believe it is currently possible to create a single array that contains both boxed and unboxed elements.

Have I overlooked something? Or else, would it be possible to support this use case in a future version of GHC?

Cheers,

Jaro


More information about the ghc-devs mailing list