primitive (byte) string literal with length?

Wed Aug 25 17:35:11 UTC 2021

On Wed, Aug 25, 2021 at 07:05:58PM +0300, Oleg Grenrus wrote:

> The newew proposal [1] is tagged as "needs revision". It doesn't
> include(# Int#, Addr# #), but those are easy to get from ByteArray#
> which has negligible overhead.
> [...]
> [1] https://github.com/ghc-proposals/ghc-proposals/pull/292

Yes, ByteArray# literals would work just as well for my needs.

The one thing that's missing, from the proposed variants:

    Rather than adding new syntax, this proposal leverages an existing
    GHC extension: QuasiQuotes. Rather than using TemplateHaskell, these
    quasiquoters would be built in to the compiler. Here are some
    examples of ByteArray# literals under this scheme:

    [octets|fe01bce8|] -- ByteArray# (four bytes)
    [utf8|Araña|]      -- ByteArray# (UTF-8)
    [utf16|Araña|]     -- ByteArray# (UTF-16, native endian)
    [utf16le|Araña|]   -- ByteArray# (UTF-16, little endian)
    [utf16be|Araña|]   -- ByteArray# (UTF-16, big endian)

is a syntax for octet-strings that does not force hex encoding of every
byte, thus something along the lines of:

    [octetstr|foo%A0bar|] -- ByteArray# (seven bytes)

The "%hh" hex octet could be "\hh" or "\xhh", ... whatever is deemed
sufficiently natural/readable (perhaps "foo\xA0\&bar" for consistency
with Haskell strings?).  The "\xhh" form would be familiar to Python
users:

    >>> x = b'foo\xA0bar'
    >>> len(x)
    7
    >>> x[3]
    160

So, I support the proposal, even though quasi-quoters are more bulky
than "somebytes"##, they have the advantage of supporting multiple
variant formats.  I might be tempted to use "octets" for the non-hex
form with "%" or other escapes, and "hexstr" (or similar) for the hex
form.

-- 
    Viktor.