Support conversion from (UArray i Word8) to ShortByteString?

Viktor Dukhovni ietf-dane at dukhovni.org
Sat Nov 11 21:20:52 UTC 2023


The 'ShortByteString' type in the "bytestring" package has seen some
singnificant improvement recently, and yet its API is still noticeably
limited in comparison to its pinned, I/O friendly 'ByteString' elder
sibling.

One of the limitations is that that there are fewer ways to construct a
'ShortByteString' object.  One often has to restort to constructing a
pinned ByteString, and then copy.  (There no "ST" Builders that write
to resizable MutableByteArrays instead of raw memory pointers).

Meanwhile, under-the covers, both the "UArray i Word8" type and
'ShortByteString' hold an immutable 'ByteArray', and the STUArray API
provides a flexible "UArray" construction interface.

Would it be reasonable to "bridge" the two APIs:

    Data.Array.Unboxed:  (re-export from Data.Array.Base)
        import Data.Array.Byte

        arrayToByteArray :: UArray i Word8 -> ByteArray
        arrayToByteArray (UArray _ _ _ ba#) = ByteArray ba#
        {-# INLINE arrayToByArrray #-}

    Data.ByteString.Short: (re-export from Data.ByteString.Short.Internal)
        byteArrayToShort :: ByteArray -> ShortByteString
        byteArrayToShort = coerce
        {-# INLINE byteArrayToShort #-}

It would then be possible to write:

    short = byteArrayToShort $ arrayToByteArray $ runSTUArray m
      where
        m = do
            a <- newArray (0, last) 0 -- zero fill
            sequence_ [ writeArray a ix e | (ix, e) <- generator ]

and generate the bytes of a 'ShortByteString' from an arbitrary
computation, possibly merging multiple inputs into some bytes by using
the recently introduced "modifyArray" (or explicit read/modify/write).

Any thoughts about the wisdom or lack thereof of this proposal?

An alternative is to add a tailored version of the UArray and STUArray
APIs to 'MutableByteArray' by extending the rather limited API of
'Data.Array.Byte':

    runMutableByteArray :: (forall s. ST s (MutableByteArray s))
                        -> ByteArray
    runMutableByteArray m = runST $ m >>= freezeMutableByteArray

    freezeMutableByteArray (MutableByteArray mba#) =
        ST $ \s -> case unsafeFreezeByteArray# mba# s of
            (# s', ba# #) -> (# s', ByteArray ba# #)

Since "Data.Array.Byte" is an "array" (rather than string) interface, it
could have a richer set of indexed read/write/modify primitives along
the lines of those found in "Data.Array.STUArray", but specialised to
'Word8' elements and implicit zero-based integer indexing.

The flexible construction I seek would then be via "Data.Array.Bytes",
rather than the somewhat too general index and value types from UArray.

    short = byteArrayToShort $ runMutableByteArray m
      where
        m = do
            a <- newByteArray size 0 -- 0 fill
            sequence_ [ writeByteArray a ix e | (ix, e) <- generator ]

In this scenario, the indexed-mutation of ShortByteStrings under
construction, or indexed-mutation of copies for various transformations,
could live in Data.Array.Byte, with ShortByteString and various
applications leveraging the random-access mutation (and resizing, ...)
to implement higher level operations.

-- 
    Viktor.


More information about the ghc-devs mailing list