Storage layout of integral types
Stefan Schulze Frielinghaus
stefansf at linux.ibm.com
Mon Feb 1 11:19:02 UTC 2021
On Tue, Jan 19, 2021 at 05:34:04PM +0100, Stefan Schulze Frielinghaus wrote:
> Hi all,
>
> I'm wondering what the supposed storage layout of integral types is. In
> particular for integral types with size less than the size of a word. For
> example, on a 64bit machine is a 32bit integer supposed to be written as a
> whole word and therefore as 64 bits or just as 32bits in the payload of a
> closure?
>
> I'm asking because since commit be5d74ca I see differently aligned integers in
> the payload of a closure on a 64bit big-endian machine. For example, in the
> following code an Int32 object is created which contains the actual integer in
> the high part of the payload (the snippet comes from the add operator
> GHC.Int.$fNumInt32_$c+_entry):
>
> Hp = Hp + 16;
> ...
> I64[Hp - 8] = GHC.Int.I32#_con_info;
> I32[Hp] = _scz7::I32;
>
> whereas e.g. in function rts_getInt32 the opposite is assumed and the actual
> integer is expected in the low part of the payload:
>
> HsInt32
> rts_getInt32 (HaskellObj p)
> {
> // See comment above:
> // ASSERT(p->header.info == I32zh_con_info ||
> // p->header.info == I32zh_static_info);
> return (HsInt32)(HsInt)(UNTAG_CLOSURE(p)->payload[0]);
> }
>
> The same seems to be the case for the interpreter and foreign calls (case
> bci_CCALL) where integral arguments are passed in the low part of a whole word.
>
> Currently, my intuition is that the payload of a closure for an integral type
> with size smaller than WordSize is written as a whole word where the subword is
> aligned according to the machines endianness. Can someone confirm this? If
> that is indeed true, then rts_getInt32 seems to be correct but not the former.
> Otherwise the converse seems to be the case.
Interestingly it looks like as if 32bit floats are also only accessed as 32bit
values, whereas for characters we have that a 32bit subword is supposed to be
accessed as a 64bit word:
section ""data" . stg_CHARLIKE_closure" {
stg_CHARLIKE_closure:
const ghczmprim_GHCziTypes_Czh_con_info;
const 0;
...
Thus in total what I see is that the payload of a closure for
I{8,16,32}# and F# are read/written according to their natural size,
respectively (ignoring the inconsistency in rts_getInt* for a moment).
In contrast to the payload of a C# closure which is read/written as a
64bit value although the natural size is 32bit only.
Can someone confirm these observations? What is the general direction:
are subwords supposed to be read/written according to their natural size
only?
Cheers,
Stefan
More information about the ghc-devs
mailing list