Is it safe to index a little bit out of bounds

Thu Mar 8 18:22:18 UTC 2018

Thanks Herbert! This is exactly the kind of data point I was looking for.
Good to know.

On Thu, Mar 8, 2018 at 12:42 PM, Herbert Valerio Riedel <hvriedel at gmail.com>
wrote:

> Hi,
>
> On 2018-03-08 at 09:19:29 -0500, Andrew Martin wrote:
> > Some of the bytes in the word will have garbage in them. However, this
> > could always be masked out with a bit mask (you have to know the platform
> > endianness for this to work right).
> >
> > Is this safe? I doubt think this could ever cause a segfault but I
> > wanted to check.
>
> Due to historical reasons, this is indeed safe. the underlying
> `StgArrBytes` structure must be word-aligned in size, otherwise bad
> things are likely to happen.
>
> I've seem some code in the wild which relies on that, and as data-point,
> I myself exploit that property in some operations (including the masking
> and endianness-aware handling you refer to) of 'text-short'[1] which is
> optimised for UTF8-based strings (<shameless-plug>and which besides
> being a practically useful library having its place in the
> text/bytearray landscape[2], text-short also serves as an incubation
> area for optimisation ideas and code of which some may end up in one way
> or another in the text-utf8 project[3]</shameless-plug>).
>
>
>  [1]: https://hackage.haskell.org/package/text-short
>
>  [2]: https://markkarpov.com/post/short-bs-and-text.html
>
>  [3]: https://hackage.haskell.org/text-utf8
>
>
> -- hvr
>

-- 
-Andrew Thaddeus Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20180308/a7161c0f/attachment.html>