[commit: ghc] master: Documentation on the stack layout algorithm (78a506a)
git at git.haskell.org
git at git.haskell.org
Thu Jan 16 11:11:46 UTC 2014
Repository : ssh://git@git.haskell.org/ghc
On branch : master
Link : http://ghc.haskell.org/trac/ghc/changeset/78a506a914f18399c8d5032fe72547b6c2abfc7a/ghc
>---------------------------------------------------------------
commit 78a506a914f18399c8d5032fe72547b6c2abfc7a
Author: Simon Marlow <marlowsd at gmail.com>
Date: Mon Jan 13 20:36:48 2014 +0000
Documentation on the stack layout algorithm
>---------------------------------------------------------------
78a506a914f18399c8d5032fe72547b6c2abfc7a
compiler/cmm/CmmLayoutStack.hs | 99 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 94 insertions(+), 5 deletions(-)
diff --git a/compiler/cmm/CmmLayoutStack.hs b/compiler/cmm/CmmLayoutStack.hs
index 5b881d8..0c49033 100644
--- a/compiler/cmm/CmmLayoutStack.hs
+++ b/compiler/cmm/CmmLayoutStack.hs
@@ -35,13 +35,95 @@ import Control.Monad (liftM)
#include "HsVersions.h"
+{- Note [Stack Layout]
-data StackSlot = Occupied | Empty
- -- Occupied: a return address or part of an update frame
+The job of this pass is to
+
+ - replace references to abstract stack Areas with fixed offsets from Sp.
+
+ - replace the CmmHighStackMark constant used in the stack check with
+ the maximum stack usage of the proc.
+
+ - save any variables that are live across a calll, and reload them as
+ necessary.
+
+Before stack allocation, local variables remain live across native
+calls (CmmCall{ cmm_cont = Just _ }), and after stack allocation local
+variables are clobbered by native calls.
+
+We want to do stack allocation so that as far as possible
+ - stack use is minimized, and
+ - unnecessary stack saves and loads are avoided.
+
+The algorithm we use is a variant of linear-scan register allocation,
+where the stack is our register file.
+
+ - First, we do a liveness analysis, which annotates every block with
+ the variables live on entry to the block.
+
+ - We traverse blocks in reverse postorder DFS; that is, we visit at
+ least one predecessor of a block before the block itself. The
+ stack layout flowing from the predecessor of the block will
+ determine the stack layout on entry to the block.
+
+ - We maintain a data structure
+
+ Map Label StackMap
+
+ which describes the contents of the stack and the stack pointer on
+ entry to each block that is a successor of a block that we have
+ visited.
+
+ - For each block we visit:
+
+ - Look up the StackMap for this block.
+
+ - If this block is a proc point (or a call continuation, if we
+ aren't splitting proc points), emit instructions to reload all
+ the live variables from the stack, according to the StackMap.
+
+ - Walk forwards through the instructions:
+ - At an assignment x = Sp[loc]
+ - Record the fact that Sp[loc] contains x, so that we won't
+ need to save x if it ever needs to be spilled.
+ - At an assignment x = E
+ - If x was previously on the stack, it isn't any more
+ - At the last node, if it is a call or a jump to a proc point
+ - Lay out the stack frame for the call (see setupStackFrame)
+ - emit instructions to save all the live variables
+ - Remember the StackMaps for all the successors
+ - emit an instruction to adjust Sp
+ - If the last node is a branch, then the current StackMap is the
+ StackMap for the successors.
+
+ - Manifest Sp: replace references to stack areas in this block
+ with real Sp offsets. We cannot do this until we have laid out
+ the stack area for the successors above.
+
+ In this phase we also eliminate redundant stores to the stack;
+ see elimStackStores.
+
+ - There is one important gotcha: sometimes we'll encounter a control
+ transfer to a block that we've already processed (a join point),
+ and in that case we might need to rearrange the stack to match
+ what the block is expecting. (exactly the same as in linear-scan
+ register allocation, except here we have the luxury of an infinite
+ supply of temporary variables).
+
+ - Finally, we update the magic CmmHighStackMark constant with the
+ stack usage of the function, and eliminate the whole stack check
+ if there was no stack use. (in fact this is done as part of the
+ main traversal, by feeding the high-water-mark output back in as
+ an input. I hate cyclic programming, but it's just too convenient
+ sometimes.)
+
+There are plenty of tricky details: update frames, proc points, return
+addresses, foreign calls, and some ad-hoc optimisations that are
+convenient to do here and effective in common cases. Comments in the
+code below explain these.
+
+-}
-instance Outputable StackSlot where
- ppr Occupied = ptext (sLit "XXX")
- ppr Empty = ptext (sLit "---")
-- All stack locations are expressed as positive byte offsets from the
-- "base", which is defined to be the address above the return address
@@ -996,6 +1078,13 @@ callResumeThread new_base id =
plusW :: DynFlags -> ByteOff -> WordOff -> ByteOff
plusW dflags b w = b + w * wORD_SIZE dflags
+data StackSlot = Occupied | Empty
+ -- Occupied: a return address or part of an update frame
+
+instance Outputable StackSlot where
+ ppr Occupied = ptext (sLit "XXX")
+ ppr Empty = ptext (sLit "---")
+
dropEmpty :: WordOff -> [StackSlot] -> Maybe [StackSlot]
dropEmpty 0 ss = Just ss
dropEmpty n (Empty : ss) = dropEmpty (n-1) ss
More information about the ghc-commits
mailing list