WordX/IntX wrap Word#/Int#?

Ben Gamari ben at smart-cactus.org
Sun Jun 11 14:44:47 UTC 2017

On June 11, 2017 8:03:10 AM EDT, Michal Terepeta <michal.terepeta at gmail.com> wrote:
>Hi all,
>I've just noticed that all `WordX` (and `IntX`) data types are
>actually implemented as wrappers around `Word#` (and `Int#`). This
>probably doesn't matter much if it's stored on the heap (due to
>pointer indirection and heap alignment), but it also means that:
>data Foo = Foo {-# UNPACK #-} !Word8 {-# UNPACK #-} !Int8
>will actually take *a lot* of space: on 64 bit we'd need 8 bytes for
>header, 8 bytes for `Word8`, 8 bytes for `Int8`.
>Is there any reason for this? The only thing I can see is that this
>avoids having to add things like `Word8#` primitives into the
>compiler. (also the codegen would need to emit zero-extend moves when
>loading from memory, like `movzb{l,q}`)
This is certainly one consideration. Another is that you would also need to teach the garbage collector to understand closures with sub-word-size fields. Currently we can encode whether each field of a closure is a pointer or not with a simple bitmap. If we naively allowed smaller fields we would need to increase the granularity of this representation to encode bytes.

Of course, one way to work around this would be to impose an invariant that guarantees that pointers are always word-aligned. Then we would probably want to shuffle sub-word sized fields, allowing two Word16s to inhabit a single word.

As you mention, this would no doubt require a bit of engineering. In particular, while x86 has robust support for sub-word-size operations, I don't believe all the platforms we support do. I these cases we would need to perform, for instance, aligned word-sized loads and stores and mask as appropriate. I may be wrong, however.

Another consideration is that the byte code interpreter would need to learn to understand these closures.

Regardless, Simon Marlow began some work in this direction a few years ago. There is a mostly complete patch in D38. All it needs is rebasing, fixing of the byte code interpreter, and then perhaps introduction of Word8# and friends. I think it would be great if we could make our heap representation a bit more space-conscious. Perhaps you could open a ticket so we collect these tidbits?

Another somewhat related issue that would be good think about in parallel to this issue is the treatment of the word-sized dependence of Word. See #11953.


- Ben

Sent from my Android device with K-9 Mail. Please excuse my brevity.

More information about the ghc-devs mailing list