`getSizeofMutableByteArray#` may return a negative number

Wed Oct 30 14:05:21 UTC 2024

On Wed, Oct 30, 2024 at 06:02:40PM +0530, Adithya Kumar wrote:

> I expected `newByteArray#` to fail if the size given to it is `< 0`
> and I never expected `getSizeofMutableByteArray#` would return a
> negative number.

The allocation size is rounded up to the next multiple of a word size
(typically 8).  So (-7) becomes 0, while (-8) remains (-8), which as
an unsigned quantity is too large to allocate.

The primop definition is:

    primop  NewByteArrayOp_Char "newByteArray#" GenPrimOp
       Int# -> State# s -> (# State# s, MutableByteArray# s #)
       {Create a new mutable byte array of specified size (in bytes), in
        the specified state thread. The size of the memory underlying the
        array will be rounded up to the platform's word size.}
       with out_of_line = True
            has_side_effects = True

The implementation is then:

  NewByteArrayOp_Char -> \case
    [(CmmLit (CmmInt n w))]
      | asUnsigned w n <= max_inl_alloc_size
      -> opIntoRegs  $ \ [res] -> doNewByteArrayOp res (fromInteger n)
    _ -> PrimopCmmEmit_External

Both (-7) and (-8) are too large as unsigned numbers to be handled
inline, taking the second branch, and then the RTS primop is called:

    stg_newByteArrayzh ( W_ n )
    {
        W_ words, payload_words;
        gcptr p;

        MAYBE_GC_N(stg_newByteArrayzh, n);

        payload_words = ROUNDUP_BYTES_TO_WDS(n);
        words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
        ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);
        if (p == NULL) {
            jump stg_raisezh(base_GHCziIOziException_heapOverflow_closure);
        }
        TICK_ALLOC_PRIM(SIZEOF_StgArrBytes,WDS(payload_words),0);
        SET_HDR(p, stg_ARR_WORDS_info, CCCS);
        StgArrBytes_bytes(p) = n;
        return (p);
    }

This takes an *unsigned* argument `n`, rounding up (-7) to 0, while
leaving (-8) alone as a very large positive value.  The rest follows.

Perhaps it would be reasonable to check that the `Int#` argument of
`NewByteArrayOp_Char` is non-negative.  The comments for `asUnsigned`
notwithstanding:

    -- | Interpret the argument as an unsigned value, assuming the value
    -- is given in two-complement form in the given width.
    --
    -- Example: @asUnsigned W64 (-1)@ is 18446744073709551615.
    --
    -- This function is used to work around the fact that many array
    -- primops take Int# arguments, but we interpret them as unsigned
    -- quantities in the code gen. This means that we have to be careful
    -- every time we work on e.g. a CmmInt literal that corresponds to the
    -- array size, as it might contain a negative Integer value if the
    -- user passed a value larger than 2^(wORD_SIZE_IN_BITS-1) as the Int#
    -- literal.
    asUnsigned :: Width -> Integer -> Integer
    asUnsigned w n = n .&. (bit (widthInBits w) - 1)

-- 
    Viktor.