primitive (byte) string literal with length?
Viktor Dukhovni
ietf-dane at dukhovni.org
Wed Aug 25 17:35:11 UTC 2021
On Wed, Aug 25, 2021 at 07:05:58PM +0300, Oleg Grenrus wrote:
> The newew proposal [1] is tagged as "needs revision". It doesn't
> include(# Int#, Addr# #), but those are easy to get from ByteArray#
> which has negligible overhead.
> [...]
> [1] https://github.com/ghc-proposals/ghc-proposals/pull/292
Yes, ByteArray# literals would work just as well for my needs.
The one thing that's missing, from the proposed variants:
Rather than adding new syntax, this proposal leverages an existing
GHC extension: QuasiQuotes. Rather than using TemplateHaskell, these
quasiquoters would be built in to the compiler. Here are some
examples of ByteArray# literals under this scheme:
[octets|fe01bce8|] -- ByteArray# (four bytes)
[utf8|Araña|] -- ByteArray# (UTF-8)
[utf16|Araña|] -- ByteArray# (UTF-16, native endian)
[utf16le|Araña|] -- ByteArray# (UTF-16, little endian)
[utf16be|Araña|] -- ByteArray# (UTF-16, big endian)
is a syntax for octet-strings that does not force hex encoding of every
byte, thus something along the lines of:
[octetstr|foo%A0bar|] -- ByteArray# (seven bytes)
The "%hh" hex octet could be "\hh" or "\xhh", ... whatever is deemed
sufficiently natural/readable (perhaps "foo\xA0\&bar" for consistency
with Haskell strings?). The "\xhh" form would be familiar to Python
users:
>>> x = b'foo\xA0bar'
>>> len(x)
7
>>> x[3]
160
So, I support the proposal, even though quasi-quoters are more bulky
than "somebytes"##, they have the advantage of supporting multiple
variant formats. I might be tempted to use "octets" for the non-hex
form with "%" or other escapes, and "hexstr" (or similar) for the hex
form.
--
Viktor.
More information about the ghc-devs
mailing list