High-level Cmm code and stack allocation
Simon Peyton Jones
simonpj at microsoft.com
Mon Jan 13 15:20:47 UTC 2014
Thanks. Reading what you write below, I can see two possible motivations.
1. Reduce stack sizes.
2. Eliminate memory moves
For (1) do we have any data to show that the non-overlap of areas was giving rise to unacceptably big stacks?
For (2) that is indeed clever, but it's pretty serendipitous: it relies on the overlap being just so, so that coincidentally y gets stored in the same place as it was loaded from. I imagine that you don't plan the stack layout to cause that to happen; it's just a coincidence. Do we have any data to show that the coincidence happens with any frequency?
Also, as you note, we lose the opportunity for certain sorts of code motion, perhaps increasing register pressure a lot. So there is a downside too.
You seldom do things without a very good reason, so I feel I must be missing something.
Simon
| -----Original Message-----
| From: Simon Marlow [mailto:marlowsd at gmail.com]
| Sent: 10 January 2014 17:00
| To: Simon Peyton Jones; Herbert Valerio Riedel
| Cc: ghc-devs at haskell.org
| Subject: Re: High-level Cmm code and stack allocation
|
| So stack areas are still a great abstraction, the only change is that
| they now overlap. It's not just about stack getting too big, I've
| copied the notes I made about it below (which I will paste into the code
| in due course). The nice property that we can generate well-defined Cmm
| without knowing explicit stack offsets is intact.
|
| What is different is that there used to be an intermediate state where
| live variables were saved to abstract stack areas across calls, but Sp
| was still not manifest. This intermediate state doesn't exist any more,
| the stack layout algorithm does it all in one pass. To me this was far
| simpler, and I think it ended up being fewer lines of code than the old
| multi-phase stack layout algorithm (it's also much faster).
|
| Of course you can always change this. My goal was to get code that was
| at least as good as the old code generator and in a reasonable amount of
| time, and this was the shortest path I could find to that goal.
|
| Cheers,
| Simon
|
| e.g. if we had
|
| x = Sp[old + 8]
| y = Sp[old + 16]
|
| Sp[young(L) + 8] = L
| Sp[young(L) + 16] = y
| Sp[young(L) + 24] = x
| call f() returns to L
|
| if areas semantically do not overlap, then we might optimise this to
|
| Sp[young(L) + 8] = L
| Sp[young(L) + 16] = Sp[old + 8]
| Sp[young(L) + 24] = Sp[old + 16]
| call f() returns to L
|
| and now young(L) cannot be allocated at the same place as old, and we
| are doomed to use more stack.
|
| - old+8 conflicts with young(L)+8
| - old+16 conflicts with young(L)+16 and young(L)+8
|
| so young(L)+8 == old+24 and we get
|
| Sp[-8] = L
| Sp[-16] = Sp[8]
| Sp[-24] = Sp[0]
| Sp -= 24
| call f() returns to L
|
| However, if areas are defined to be "possibly overlapping" in the
| semantics, then we cannot commute any loads/stores of old with young(L),
| and we will be able to re-use both old+8 and old+16 for young(L).
|
| x = Sp[8]
| y = Sp[0]
|
| Sp[8] = L
| Sp[0] = y
| Sp[-8] = x
| Sp = Sp - 8
| call f() returns to L
|
| Now, the assignments of y go away,
|
| x = Sp[8]
| Sp[8] = L
| Sp[-8] = x
| Sp = Sp - 8
| call f() returns to L
|
|
| Conclusion:
|
| - T[old+N] aliases with U[young(L)+M] for all T, U, L, N and M
| - T[old+N] aliases with U[old+M] only if the areas actually overlap
|
| this ensures that we will not commute any accesses to old with
| young(L) or young(L) with young(L'), and the stack allocator will get
| the maximum opportunity to overlap these areas, keeping the stack use to
| a minimum and possibly avoiding some assignments.
|
|
|
| On 10/01/2014 16:35, Simon Peyton Jones wrote:
| > Oh, ok. Alas, a good chunk of my model of Cmm has just gone out of
| the window. I thought that areas were such a lovely, well-behaved
| abstraction. I was thrilled when we came up with them, and I'm very
| sorry to see them go.
| >
| > There are no many things that I no longer understand. I now have no
| idea how we save live variables over a call, or how multiple returned
| values from one call (returned on the stack) stay right where they are
| if they are live across the next call.
| >
| > What was the actual problem? That functions used too much stack, so
| the stack was getting too big? But a one slot area corresponds exactly
| to a live variable, so I don't see how the area abstraction could
| possibly increase stack size. And is stack size a crucial issue anyway?
| >
| > Apart from anything else, areas would have given a lovely solution to
| the problem this thread started with!
| >
| > I guess we can talk about this when you next visit? But some
| documentation would be welcome.
| >
| > Simon
| >
| > | -----Original Message-----
| > | From: Simon Marlow [mailto:marlowsd at gmail.com]
| > | Sent: 10 January 2014 16:24
| > | To: Simon Peyton Jones; Herbert Valerio Riedel
| > | Cc: ghc-devs at haskell.org
| > | Subject: Re: High-level Cmm code and stack allocation
| > |
| > | There are no one-slot areas any more, I ditched those when I rewrote
| > | the stack allocator. There is only ever one live area: either the
| > | old area or the young area for a call we are about to make or have
| just made.
| > | (see the data type: I removed the one-slot areas)
| > |
| > | I struggled for a long time with this. The problem is that with the
| > | semantics of non-overlapping areas, code motion optimisations would
| > | tend to increase the stack requirements of the function by
| > | overlapping the live ranges of the areas. I concluded that actually
| > | what we wanted was areas that really do overlap, and optimisations
| > | that respect that, so that we get more efficient stack usage.
| > |
| > | Cheers,
| > | Simon
| > |
| > | On 10/01/2014 15:22, Simon Peyton Jones wrote:
| > | > That documentation would be good, yes! I don't know what it means
| > | > to
| > | say "we don't really have a general concept of areas any more". We
| > | did before, and I didn't know that it had gone away. Urk! We can
| > | have lots of live areas, notably the old area (for the current
| > | call/return parameters, the call area for a call we are preparing,
| > | and the one-slot areas for variables we are saving on the stack.
| > | >
| > | > Here's he current story
| > | > https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAre
| > | > as
| > | >
| > | > I agree that we have no concrete syntax for talking about areas,
| > | > but
| > | that is something we could fix. But I'm worried that they may not
| > | mean what they used to mean.
| > | >
| > | > Simon
| > | >
| > | > | -----Original Message-----
| > | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
| > | > | Sent: 09 January 2014 08:39
| > | > | To: Simon Peyton Jones; Herbert Valerio Riedel
| > | > | Cc: ghc-devs at haskell.org
| > | > | Subject: Re: High-level Cmm code and stack allocation
| > | > |
| > | > | On 08/01/2014 10:07, Simon Peyton Jones wrote:
| > | > | > | > Can't we just allocate a Cmm "area"? The address of an
| > | > | > | > area is a
| > | > | > | perfectly well-defined Cmm value.
| > | > | >
| > | > | > What about this idea?
| > | > |
| > | > | We don't really have a general concept of areas (any more), and
| > | > | areas aren't exposed in the concrete Cmm syntax at all. The
| > | > | current semantics is that areas may overlap with each other, so
| > | > | there should only be one active area at any point. I found that
| > | > | this was important to ensure that we could generate good code
| > | > | from the stack layout algorithm, otherwise it had to make
| > | > | pessimistic assumptions
| > | and use too much stack.
| > | > |
| > | > | You're going to ask me where this is documented, and I think I
| > | > | have to admit to slacking off, sorry :-) We did discuss it at
| > | > | the time, and I made copious notes, but I didn't transfer those
| to the code.
| > | > | I'll add a Note.
| > | > |
| > | > | Cheers,
| > | > | Simon
| > | > |
| > | > |
| > | > | > Simon
| > | > | >
| > | > | > | -----Original Message-----
| > | > | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
| > | > | > | Sent: 08 January 2014 09:26
| > | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel
| > | > | > | Cc: ghc-devs at haskell.org
| > | > | > | Subject: Re: High-level Cmm code and stack allocation
| > | > | > |
| > | > | > | On 07/01/14 22:53, Simon Peyton Jones wrote:
| > | > | > | > | Yes, this is technically wrong but luckily works. I'd
| > | > | > | > | very much like to have a better solution, preferably one
| > | > | > | > | that doesn't add any extra overhead.
| > | > | > | >
| > | > | > | > | __decodeFloat_Int is a C function, so it will not touch
| > | > | > | > | the Haskell stack.
| > | > | > | >
| > | > | > | > This all seems terribly fragile to me. At least it ought
| > | > | > | > to be
| > | > | > | surrounded with massive comments pointing out how terribly
| > | > | > | fragile it is, breaking all the rules that we carefully
| > | > | > | document
| > | elsewhere.
| > | > | > | >
| > | > | > | > Can't we just allocate a Cmm "area"? The address of an
| > | > | > | > area is a
| > | > | > | perfectly well-defined Cmm value.
| > | > | > |
| > | > | > | It is fragile, yes. We can't use static memory because it
| > | > | > | needs to be thread-local. This particular hack has gone
| > | > | > | through several iterations over the years: first we had
| > | > | > | static memory, which broke when we did the parallel runtime,
| > | > | > | then we had special storage in the Capability, which we gave
| > | > | > | up when GMP was split out into a separate library, because
| > | > | > | it didn't seem right to have magic fields in the Capability
| for one library.
| > | > | > |
| > | > | > | I'm looking into whether we can do temporary allocation on
| > | > | > | the heap for this instead.
| > | > | > |
| > | > | > | Cheers,
| > | > | > | Simon
| > | > | > |
| > | > | > |
| > | > | > | > Simon
| > | > | > | >
| > | > | > | > | -----Original Message-----
| > | > | > | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On
| > | > | > | > | Behalf Of Simon Marlow
| > | > | > | > | Sent: 07 January 2014 16:05
| > | > | > | > | To: Herbert Valerio Riedel; ghc-devs at haskell.org
| > | > | > | > | Subject: Re: High-level Cmm code and stack allocation
| > | > | > | > |
| > | > | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
| > | > | > | > | > Hello,
| > | > | > | > | >
| > | > | > | > | > According to Note [Syntax of .cmm files],
| > | > | > | > | >
| > | > | > | > | > | There are two ways to write .cmm code:
| > | > | > | > | > |
| > | > | > | > | > | (1) High-level Cmm code delegates the stack
| > | > | > | > | > | handling to GHC,
| > | > | > | and
| > | > | > | > | > | never explicitly mentions Sp or registers.
| > | > | > | > | > |
| > | > | > | > | > | (2) Low-level Cmm manages the stack itself, and
| > | > | > | > | > | must know
| > | > | about
| > | > | > | > | > | calling conventions.
| > | > | > | > | > |
| > | > | > | > | > | Whether you want high-level or low-level Cmm is
| > | > | > | > | > | indicated by the presence of an argument list on a
| > | procedure.
| > | > | > | > | >
| > | > | > | > | > However, while working on integer-gmp I've been
| > | > | > | > | > noticing in integer-gmp/cbits/gmp-wrappers.cmm that
| > | > | > | > | > even though all Cmm
| > | > | > | > | procedures
| > | > | > | > | > have been converted to high-level Cmm, they still
| > | > | > | > | > reference the
| > | > | > | 'Sp'
| > | > | > | > | > register, e.g.
| > | > | > | > | >
| > | > | > | > | >
| > | > | > | > | > #define GMP_TAKE1_RET1(name,mp_fun) \
| > | > | > | > | > name (W_ ws1, P_ d1) \
| > | > | > | > | > { \
| > | > | > | > | > W_ mp_tmp1; \
| > | > | > | > | > W_ mp_result1; \
| > | > | > | > | > \
| > | > | > | > | > again: \
| > | > | > | > | > STK_CHK_GEN_N (2 * SIZEOF_MP_INT); \
| > | > | > | > | > MAYBE_GC(again); \
| > | > | > | > | > \
| > | > | > | > | > mp_tmp1 = Sp - 1 * SIZEOF_MP_INT; \
| > | > | > | > | > mp_result1 = Sp - 2 * SIZEOF_MP_INT; \
| > | > | > | > | > ... \
| > | > | > | > | >
| > | > | > | > | >
| > | > | > | > | > So is this valid high-level Cmm code? What's the
| > | > | > | > | > proper way to
| > | > | > | > | allocate
| > | > | > | > | > Stack (and/or Heap) memory from high-level Cmm code?
| > | > | > | > |
| > | > | > | > | Yes, this is technically wrong but luckily works. I'd
| > | > | > | > | very much like to have a better solution, preferably one
| > | > | > | > | that doesn't add any extra overhead.
| > | > | > | > |
| > | > | > | > | The problem here is that we need to allocate a couple of
| > | > | > | > | temporary words and take their address; that's an
| > | > | > | > | unusual thing to do in Cmm, so it only occurs in a few
| > | > | > | > | places (mainly
| > | > | interacting with gmp).
| > | > | > | > | Usually if you want some temporary storage you can use
| > | > | > | > | local variables or some heap-allocated memory.
| > | > | > | > |
| > | > | > | > | Cheers,
| > | > | > | > | Simon
| > | > | > | > | _______________________________________________
| > | > | > | > | ghc-devs mailing list
| > | > | > | > | ghc-devs at haskell.org
| > | > | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs
| > | > | > | >
| > | > | >
| > | >
| >
More information about the ghc-devs
mailing list