Is it safe to index a little bit out of bounds

Thu Mar 8 17:42:16 UTC 2018

Hi,

On 2018-03-08 at 09:19:29 -0500, Andrew Martin wrote:
> Some of the bytes in the word will have garbage in them. However, this
> could always be masked out with a bit mask (you have to know the platform
> endianness for this to work right).
>
> Is this safe? I doubt think this could ever cause a segfault but I
> wanted to check.

Due to historical reasons, this is indeed safe. the underlying
`StgArrBytes` structure must be word-aligned in size, otherwise bad
things are likely to happen.

I've seem some code in the wild which relies on that, and as data-point,
I myself exploit that property in some operations (including the masking
and endianness-aware handling you refer to) of 'text-short'[1] which is
optimised for UTF8-based strings (<shameless-plug>and which besides
being a practically useful library having its place in the
text/bytearray landscape[2], text-short also serves as an incubation
area for optimisation ideas and code of which some may end up in one way
or another in the text-utf8 project[3]</shameless-plug>).

 [1]: https://hackage.haskell.org/package/text-short

 [2]: https://markkarpov.com/post/short-bs-and-text.html

 [3]: https://hackage.haskell.org/text-utf8

-- hvr