Mixed boxed/unboxed arrays?

Viktor Dukhovni ietf-dane at dukhovni.org
Wed Aug 3 20:50:31 UTC 2022


On Wed, Aug 03, 2022 at 10:16:50PM +0200, J. Reinders wrote:

> I have an implementation that mostly works here:
> https://github.com/noughtmare/clutter
> in the src/Counter.hs file.
> 
> The only problem is that I get segfaults or internal GHC errors if I
> run it on large files. I’ve adding some tracing and it seems to occur
> when I try to coerce back pointers from the hash table array to proper
> Haskell values in the ’toList’ function.

Yes, this is delicate, requiring detailed knowledge of the internals.

> Currently, I’m using the ‘ptrToAny' and ‘anyToPtr' functions to do the
> coercing, because that sounds like the safest option.
> 
> Do you know what’s going wrong or do you have a safer design for coercing the pointers?

The code at:

    https://github.com/noughtmare/clutter/blob/main/src/Counter.hs#L50-L52

looks wrong.  You're ignoring the return value of `compactAdd`, and
coercing the original (non-compact) key to a pointer, but this is liable
to be moved by GC.  You need something like:

    p <- addCompact c k >>= getCompact >>= anyToPtr

    

> I thought it might be because the compact region gets deallocated
> before all the pointers are extracted, but even if I add a ’touch c’
> (where c contains the compact region) at the end it still gives the
> same errors.

Given the issue above, it is too early to speculate along these lines.

It may also turn out that once the code works, it may be no faster or
even much slower than the two-array approach.  Compacting new keys has a
cost, and perhaps that will dominate any speedup from combining the key
and value in the same primitive cell.

-- 
    Viktor.


More information about the ghc-devs mailing list