LLVM 17.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | |
56// | callee-saved fp/simd/SVE regs |
57// | |
58// |-----------------------------------|
59// | |
60// | SVE stack objects |
61// | |
62// |-----------------------------------|
63// |.empty.space.to.make.part.below....|
64// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
65// |.the.standard.16-byte.alignment....| compile time; if present)
66// |-----------------------------------|
67// | |
68// | local variables of fixed size |
69// | including spill slots |
70// |-----------------------------------| <- bp(not defined by ABI,
71// |.variable-sized.local.variables....| LLVM chooses X19)
72// |.(VLAs)............................| (size of this area is unknown at
73// |...................................| compile time)
74// |-----------------------------------| <- sp
75// | | Lower address
76//
77//
78// To access the data in a frame, at-compile time, a constant offset must be
79// computable from one of the pointers (fp, bp, sp) to access it. The size
80// of the areas with a dotted background cannot be computed at compile-time
81// if they are present, making it required to have all three of fp, bp and
82// sp to be set up to be able to access all contents in the frame areas,
83// assuming all of the frame areas are non-empty.
84//
85// For most functions, some of the frame areas are empty. For those functions,
86// it may not be necessary to set up fp or bp:
87// * A base pointer is definitely needed when there are both VLAs and local
88// variables with more-than-default alignment requirements.
89// * A frame pointer is definitely needed when there are local variables with
90// more-than-default alignment requirements.
91//
92// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
93// callee-saved area, since the unwind encoding does not allow for encoding
94// this dynamically and existing tools depend on this layout. For other
95// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
96// area to allow SVE stack objects (allocated directly below the callee-saves,
97// if available) to be accessed directly from the framepointer.
98// The SVE spill/fill instructions have VL-scaled addressing modes such
99// as:
100// ldr z8, [fp, #-7 mul vl]
101// For SVE the size of the vector length (VL) is not known at compile-time, so
102// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
103// layout, we don't need to add an unscaled offset to the framepointer before
104// accessing the SVE object in the frame.
105//
106// In some cases when a base pointer is not strictly needed, it is generated
107// anyway when offsets from the frame pointer to access local variables become
108// so large that the offset can't be encoded in the immediate fields of loads
109// or stores.
110//
111// Outgoing function arguments must be at the bottom of the stack frame when
112// calling another function. If we do not have variable-sized stack objects, we
113// can allocate a "reserved call frame" area at the bottom of the local
114// variable area, large enough for all outgoing calls. If we do have VLAs, then
115// the stack pointer must be decremented and incremented around each call to
116// make space for the arguments below the VLAs.
117//
118// FIXME: also explain the redzone concept.
119//
120// An example of the prologue:
121//
122// .globl __foo
123// .align 2
124// __foo:
125// Ltmp0:
126// .cfi_startproc
127// .cfi_personality 155, ___gxx_personality_v0
128// Leh_func_begin:
129// .cfi_lsda 16, Lexception33
130//
131// stp xa,bx, [sp, -#offset]!
132// ...
133// stp x28, x27, [sp, #offset-32]
134// stp fp, lr, [sp, #offset-16]
135// add fp, sp, #offset - 16
136// sub sp, sp, #1360
137//
138// The Stack:
139// +-------------------------------------------+
140// 10000 | ........ | ........ | ........ | ........ |
141// 10004 | ........ | ........ | ........ | ........ |
142// +-------------------------------------------+
143// 10008 | ........ | ........ | ........ | ........ |
144// 1000c | ........ | ........ | ........ | ........ |
145// +===========================================+
146// 10010 | X28 Register |
147// 10014 | X28 Register |
148// +-------------------------------------------+
149// 10018 | X27 Register |
150// 1001c | X27 Register |
151// +===========================================+
152// 10020 | Frame Pointer |
153// 10024 | Frame Pointer |
154// +-------------------------------------------+
155// 10028 | Link Register |
156// 1002c | Link Register |
157// +===========================================+
158// 10030 | ........ | ........ | ........ | ........ |
159// 10034 | ........ | ........ | ........ | ........ |
160// +-------------------------------------------+
161// 10038 | ........ | ........ | ........ | ........ |
162// 1003c | ........ | ........ | ........ | ........ |
163// +-------------------------------------------+
164//
165// [sp] = 10030 :: >>initial value<<
166// sp = 10020 :: stp fp, lr, [sp, #-16]!
167// fp = sp == 10020 :: mov fp, sp
168// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
169// sp == 10010 :: >>final value<<
170//
171// The frame pointer (w29) points to address 10020. If we use an offset of
172// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
173// for w27, and -32 for w28:
174//
175// Ltmp1:
176// .cfi_def_cfa w29, 16
177// Ltmp2:
178// .cfi_offset w30, -8
179// Ltmp3:
180// .cfi_offset w29, -16
181// Ltmp4:
182// .cfi_offset w27, -24
183// Ltmp5:
184// .cfi_offset w28, -32
185//
186//===----------------------------------------------------------------------===//
187
188#include "AArch64FrameLowering.h"
189#include "AArch64InstrInfo.h"
191#include "AArch64RegisterInfo.h"
192#include "AArch64Subtarget.h"
193#include "AArch64TargetMachine.h"
196#include "llvm/ADT/ScopeExit.h"
197#include "llvm/ADT/SmallVector.h"
198#include "llvm/ADT/Statistic.h"
214#include "llvm/IR/Attributes.h"
215#include "llvm/IR/CallingConv.h"
216#include "llvm/IR/DataLayout.h"
217#include "llvm/IR/DebugLoc.h"
218#include "llvm/IR/Function.h"
219#include "llvm/MC/MCAsmInfo.h"
220#include "llvm/MC/MCDwarf.h"
222#include "llvm/Support/Debug.h"
228#include <cassert>
229#include <cstdint>
230#include <iterator>
231#include <optional>
232#include <vector>
233
234using namespace llvm;
235
236#define DEBUG_TYPE "frame-info"
237
238static cl::opt<bool> EnableRedZone("aarch64-redzone",
239 cl::desc("enable use of redzone on AArch64"),
240 cl::init(false), cl::Hidden);
241
242static cl::opt<bool>
243 ReverseCSRRestoreSeq("reverse-csr-restore-seq",
244 cl::desc("reverse the CSR restore sequence"),
245 cl::init(false), cl::Hidden);
246
248 "stack-tagging-merge-settag",
249 cl::desc("merge settag instruction in function epilog"), cl::init(true),
250 cl::Hidden);
251
252static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
253 cl::desc("sort stack allocations"),
254 cl::init(true), cl::Hidden);
255
257 "homogeneous-prolog-epilog", cl::Hidden,
258 cl::desc("Emit homogeneous prologue and epilogue for the size "
259 "optimization (default = off)"));
260
261STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
262
263/// Returns how much of the incoming argument stack area (in bytes) we should
264/// clean up in an epilogue. For the C calling convention this will be 0, for
265/// guaranteed tail call conventions it can be positive (a normal return or a
266/// tail call to a function that uses less stack space for arguments) or
267/// negative (for a tail call to a function that needs more stack space than us
268/// for arguments).
272 bool IsTailCallReturn = false;
273 if (MBB.end() != MBBI) {
274 unsigned RetOpcode = MBBI->getOpcode();
275 IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi ||
276 RetOpcode == AArch64::TCRETURNri ||
277 RetOpcode == AArch64::TCRETURNriBTI;
278 }
280
281 int64_t ArgumentPopSize = 0;
282 if (IsTailCallReturn) {
283 MachineOperand &StackAdjust = MBBI->getOperand(1);
284
285 // For a tail-call in a callee-pops-arguments environment, some or all of
286 // the stack may actually be in use for the call's arguments, this is
287 // calculated during LowerCall and consumed here...
288 ArgumentPopSize = StackAdjust.getImm();
289 } else {
290 // ... otherwise the amount to pop is *all* of the argument space,
291 // conveniently stored in the MachineFunctionInfo by
292 // LowerFormalArguments. This will, of course, be zero for the C calling
293 // convention.
294 ArgumentPopSize = AFI->getArgumentStackToRestore();
295 }
296
297 return ArgumentPopSize;
298}
299
301static bool needsWinCFI(const MachineFunction &MF);
304
305/// Returns true if a homogeneous prolog or epilog code can be emitted
306/// for the size optimization. If possible, a frame helper call is injected.
307/// When Exit block is given, this check is for epilog.
308bool AArch64FrameLowering::homogeneousPrologEpilog(
309 MachineFunction &MF, MachineBasicBlock *Exit) const {
310 if (!MF.getFunction().hasMinSize())
311 return false;
313 return false;
315 return false;
316 if (EnableRedZone)
317 return false;
318
319 // TODO: Window is supported yet.
320 if (needsWinCFI(MF))
321 return false;
322 // TODO: SVE is not supported yet.
323 if (getSVEStackSize(MF))
324 return false;
325
326 // Bail on stack adjustment needed on return for simplicity.
327 const MachineFrameInfo &MFI = MF.getFrameInfo();
329 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
330 return false;
331 if (Exit && getArgumentStackToRestore(MF, *Exit))
332 return false;
333
334 return true;
335}
336
337/// Returns true if CSRs should be paired.
338bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
339 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
340}
341
342/// This is the biggest offset to the stack pointer we can encode in aarch64
343/// instructions (without using a separate calculation and a temp register).
344/// Note that the exception here are vector stores/loads which cannot encode any
345/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
346static const unsigned DefaultSafeSPDisplacement = 255;
347
348/// Look at each instruction that references stack frames and return the stack
349/// size limit beyond which some of these instructions will require a scratch
350/// register during their expansion later.
352 // FIXME: For now, just conservatively guestimate based on unscaled indexing
353 // range. We'll end up allocating an unnecessary spill slot a lot, but
354 // realistically that's not a big deal at this stage of the game.
355 for (MachineBasicBlock &MBB : MF) {
356 for (MachineInstr &MI : MBB) {
357 if (MI.isDebugInstr() || MI.isPseudo() ||
358 MI.getOpcode() == AArch64::ADDXri ||
359 MI.getOpcode() == AArch64::ADDSXri)
360 continue;
361
362 for (const MachineOperand &MO : MI.operands()) {
363 if (!MO.isFI())
364 continue;
365
367 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
369 return 0;
370 }
371 }
372 }
374}
375
379}
380
381/// Returns the size of the fixed object area (allocated next to sp on entry)
382/// On Win64 this may include a var args area and an UnwindHelp object for EH.
383static unsigned getFixedObjectSize(const MachineFunction &MF,
384 const AArch64FunctionInfo *AFI, bool IsWin64,
385 bool IsFunclet) {
386 if (!IsWin64 || IsFunclet) {
387 return AFI->getTailCallReservedStack();
388 } else {
389 if (AFI->getTailCallReservedStack() != 0)
390 report_fatal_error("cannot generate ABI-changing tail call for Win64");
391 // Var args are stored here in the primary function.
392 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
393 // To support EH funclets we allocate an UnwindHelp object
394 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
395 return alignTo(VarArgsArea + UnwindHelpObject, 16);
396 }
397}
398
399/// Returns the size of the entire SVE stackframe (calleesaves + spills).
402 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
403}
404
406 if (!EnableRedZone)
407 return false;
408
409 // Don't use the red zone if the function explicitly asks us not to.
410 // This is typically used for kernel code.
411 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
412 const unsigned RedZoneSize =
414 if (!RedZoneSize)
415 return false;
416
417 const MachineFrameInfo &MFI = MF.getFrameInfo();
419 uint64_t NumBytes = AFI->getLocalStackSize();
420
421 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
422 getSVEStackSize(MF));
423}
424
425/// hasFP - Return true if the specified function should have a dedicated frame
426/// pointer register.
428 const MachineFrameInfo &MFI = MF.getFrameInfo();
429 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
430 // Win64 EH requires a frame pointer if funclets are present, as the locals
431 // are accessed off the frame pointer in both the parent function and the
432 // funclets.
433 if (MF.hasEHFunclets())
434 return true;
435 // Retain behavior of always omitting the FP for leaf functions when possible.
437 return true;
438 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
439 MFI.hasStackMap() || MFI.hasPatchPoint() ||
440 RegInfo->hasStackRealignment(MF))
441 return true;
442 // With large callframes around we may need to use FP to access the scavenging
443 // emergency spillslot.
444 //
445 // Unfortunately some calls to hasFP() like machine verifier ->
446 // getReservedReg() -> hasFP in the middle of global isel are too early
447 // to know the max call frame size. Hopefully conservatively returning "true"
448 // in those cases is fine.
449 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
450 if (!MFI.isMaxCallFrameSizeComputed() ||
452 return true;
453
454 return false;
455}
456
457/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
458/// not required, we reserve argument space for call sites in the function
459/// immediately on entry to the current function. This eliminates the need for
460/// add/sub sp brackets around call sites. Returns true if the call frame is
461/// included as part of the stack frame.
462bool
464 return !MF.getFrameInfo().hasVarSizedObjects();
465}
466
470 const AArch64InstrInfo *TII =
471 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
472 DebugLoc DL = I->getDebugLoc();
473 unsigned Opc = I->getOpcode();
474 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
475 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
476
477 if (!hasReservedCallFrame(MF)) {
478 int64_t Amount = I->getOperand(0).getImm();
479 Amount = alignTo(Amount, getStackAlign());
480 if (!IsDestroy)
481 Amount = -Amount;
482
483 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
484 // doesn't have to pop anything), then the first operand will be zero too so
485 // this adjustment is a no-op.
486 if (CalleePopAmount == 0) {
487 // FIXME: in-function stack adjustment for calls is limited to 24-bits
488 // because there's no guaranteed temporary register available.
489 //
490 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
491 // 1) For offset <= 12-bit, we use LSL #0
492 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
493 // LSL #0, and the other uses LSL #12.
494 //
495 // Most call frames will be allocated at the start of a function so
496 // this is OK, but it is a limitation that needs dealing with.
497 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
498 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
499 StackOffset::getFixed(Amount), TII);
500 }
501 } else if (CalleePopAmount != 0) {
502 // If the calling convention demands that the callee pops arguments from the
503 // stack, we want to add it back if we have a reserved call frame.
504 assert(CalleePopAmount < 0xffffff && "call frame too large");
505 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
506 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
507 }
508 return MBB.erase(I);
509}
510
511void AArch64FrameLowering::emitCalleeSavedGPRLocations(
514 MachineFrameInfo &MFI = MF.getFrameInfo();
515
516 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
517 if (CSI.empty())
518 return;
519
520 const TargetSubtargetInfo &STI = MF.getSubtarget();
521 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
522 const TargetInstrInfo &TII = *STI.getInstrInfo();
524
525 for (const auto &Info : CSI) {
526 if (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector)
527 continue;
528
529 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
530 unsigned DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
531
532 int64_t Offset =
533 MFI.getObjectOffset(Info.getFrameIdx()) - getOffsetOfLocalArea();
534 unsigned CFIIndex = MF.addFrameInst(
535 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
536 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
537 .addCFIIndex(CFIIndex)
539 }
540}
541
542void AArch64FrameLowering::emitCalleeSavedSVELocations(
545 MachineFrameInfo &MFI = MF.getFrameInfo();
546
547 // Add callee saved registers to move list.
548 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
549 if (CSI.empty())
550 return;
551
552 const TargetSubtargetInfo &STI = MF.getSubtarget();
553 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
554 const TargetInstrInfo &TII = *STI.getInstrInfo();
557
558 for (const auto &Info : CSI) {
559 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
560 continue;
561
562 // Not all unwinders may know about SVE registers, so assume the lowest
563 // common demoninator.
564 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
565 unsigned Reg = Info.getReg();
566 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
567 continue;
568
570 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
572
573 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
574 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
575 .addCFIIndex(CFIIndex)
577 }
578}
579
580static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF,
583 unsigned DwarfReg) {
584 unsigned CFIIndex =
585 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
586 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
587}
588
590 MachineBasicBlock &MBB) const {
591
593 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
594 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
595 const auto &TRI =
596 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
597 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
598
599 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
600 DebugLoc DL;
601
602 // Reset the CFA to `SP + 0`.
604 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
605 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
606 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
607
608 // Flip the RA sign state.
609 if (MFI.shouldSignReturnAddress(MF)) {
611 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
612 }
613
614 // Shadow call stack uses X18, reset it.
616 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
617 TRI.getDwarfRegNum(AArch64::X18, true));
618
619 // Emit .cfi_same_value for callee-saved registers.
620 const std::vector<CalleeSavedInfo> &CSI =
622 for (const auto &Info : CSI) {
623 unsigned Reg = Info.getReg();
624 if (!TRI.regNeedsCFI(Reg, Reg))
625 continue;
626 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
627 TRI.getDwarfRegNum(Reg, true));
628 }
629}
630
633 bool SVE) {
635 MachineFrameInfo &MFI = MF.getFrameInfo();
636
637 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
638 if (CSI.empty())
639 return;
640
641 const TargetSubtargetInfo &STI = MF.getSubtarget();
642 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
643 const TargetInstrInfo &TII = *STI.getInstrInfo();
645
646 for (const auto &Info : CSI) {
647 if (SVE !=
648 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
649 continue;
650
651 unsigned Reg = Info.getReg();
652 if (SVE &&
653 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
654 continue;
655
656 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
657 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
658 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
659 .addCFIIndex(CFIIndex)
661 }
662}
663
664void AArch64FrameLowering::emitCalleeSavedGPRRestores(
667}
668
669void AArch64FrameLowering::emitCalleeSavedSVERestores(
672}
673
674static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
675 switch (Reg.id()) {
676 default:
677 // The called routine is expected to preserve r19-r28
678 // r29 and r30 are used as frame pointer and link register resp.
679 return 0;
680
681 // GPRs
682#define CASE(n) \
683 case AArch64::W##n: \
684 case AArch64::X##n: \
685 return AArch64::X##n
686 CASE(0);
687 CASE(1);
688 CASE(2);
689 CASE(3);
690 CASE(4);
691 CASE(5);
692 CASE(6);
693 CASE(7);
694 CASE(8);
695 CASE(9);
696 CASE(10);
697 CASE(11);
698 CASE(12);
699 CASE(13);
700 CASE(14);
701 CASE(15);
702 CASE(16);
703 CASE(17);
704 CASE(18);
705#undef CASE
706
707 // FPRs
708#define CASE(n) \
709 case AArch64::B##n: \
710 case AArch64::H##n: \
711 case AArch64::S##n: \
712 case AArch64::D##n: \
713 case AArch64::Q##n: \
714 return HasSVE ? AArch64::Z##n : AArch64::Q##n
715 CASE(0);
716 CASE(1);
717 CASE(2);
718 CASE(3);
719 CASE(4);
720 CASE(5);
721 CASE(6);
722 CASE(7);
723 CASE(8);
724 CASE(9);
725 CASE(10);
726 CASE(11);
727 CASE(12);
728 CASE(13);
729 CASE(14);
730 CASE(15);
731 CASE(16);
732 CASE(17);
733 CASE(18);
734 CASE(19);
735 CASE(20);
736 CASE(21);
737 CASE(22);
738 CASE(23);
739 CASE(24);
740 CASE(25);
741 CASE(26);
742 CASE(27);
743 CASE(28);
744 CASE(29);
745 CASE(30);
746 CASE(31);
747#undef CASE
748 }
749}
750
751void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
752 MachineBasicBlock &MBB) const {
753 // Insertion point.
755
756 // Fake a debug loc.
757 DebugLoc DL;
758 if (MBBI != MBB.end())
759 DL = MBBI->getDebugLoc();
760
761 const MachineFunction &MF = *MBB.getParent();
764
765 BitVector GPRsToZero(TRI.getNumRegs());
766 BitVector FPRsToZero(TRI.getNumRegs());
767 bool HasSVE = STI.hasSVE();
768 for (MCRegister Reg : RegsToZero.set_bits()) {
769 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
770 // For GPRs, we only care to clear out the 64-bit register.
771 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
772 GPRsToZero.set(XReg);
773 } else if (AArch64::FPR128RegClass.contains(Reg) ||
774 AArch64::FPR64RegClass.contains(Reg) ||
775 AArch64::FPR32RegClass.contains(Reg) ||
776 AArch64::FPR16RegClass.contains(Reg) ||
777 AArch64::FPR8RegClass.contains(Reg)) {
778 // For FPRs,
779 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
780 FPRsToZero.set(XReg);
781 }
782 }
783
784 const AArch64InstrInfo &TII = *STI.getInstrInfo();
785
786 // Zero out GPRs.
787 for (MCRegister Reg : GPRsToZero.set_bits())
788 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), Reg).addImm(0);
789
790 // Zero out FP/vector registers.
791 for (MCRegister Reg : FPRsToZero.set_bits())
792 if (HasSVE)
793 BuildMI(MBB, MBBI, DL, TII.get(AArch64::DUP_ZI_D), Reg)
794 .addImm(0)
795 .addImm(0);
796 else
797 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVIv2d_ns), Reg).addImm(0);
798
799 if (HasSVE) {
800 for (MCRegister PReg :
801 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
802 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
803 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
804 AArch64::P15}) {
805 if (RegsToZero[PReg])
806 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
807 }
808 }
809}
810
811// Find a scratch register that we can use at the start of the prologue to
812// re-align the stack pointer. We avoid using callee-save registers since they
813// may appear to be free when this is called from canUseAsPrologue (during
814// shrink wrapping), but then no longer be free when this is called from
815// emitPrologue.
816//
817// FIXME: This is a bit conservative, since in the above case we could use one
818// of the callee-save registers as a scratch temp to re-align the stack pointer,
819// but we would then have to make sure that we were in fact saving at least one
820// callee-save register in the prologue, which is additional complexity that
821// doesn't seem worth the benefit.
824
825 // If MBB is an entry block, use X9 as the scratch register
826 if (&MF->front() == MBB)
827 return AArch64::X9;
828
829 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
830 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
831 LivePhysRegs LiveRegs(TRI);
832 LiveRegs.addLiveIns(*MBB);
833
834 // Mark callee saved registers as used so we will not choose them.
835 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
836 for (unsigned i = 0; CSRegs[i]; ++i)
837 LiveRegs.addReg(CSRegs[i]);
838
839 // Prefer X9 since it was historically used for the prologue scratch reg.
840 const MachineRegisterInfo &MRI = MF->getRegInfo();
841 if (LiveRegs.available(MRI, AArch64::X9))
842 return AArch64::X9;
843
844 for (unsigned Reg : AArch64::GPR64RegClass) {
845 if (LiveRegs.available(MRI, Reg))
846 return Reg;
847 }
848 return AArch64::NoRegister;
849}
850
852 const MachineBasicBlock &MBB) const {
853 const MachineFunction *MF = MBB.getParent();
854 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
855 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
856 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
857
858 // Don't need a scratch register if we're not going to re-align the stack.
859 if (!RegInfo->hasStackRealignment(*MF))
860 return true;
861 // Otherwise, we can use any block as long as it has a scratch register
862 // available.
863 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
864}
865
867 uint64_t StackSizeInBytes) {
868 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
869 if (!Subtarget.isTargetWindows())
870 return false;
871 const Function &F = MF.getFunction();
872 // TODO: When implementing stack protectors, take that into account
873 // for the probe threshold.
874 unsigned StackProbeSize =
875 F.getFnAttributeAsParsedInteger("stack-probe-size", 4096);
876 return (StackSizeInBytes >= StackProbeSize) &&
877 !F.hasFnAttribute("no-stack-arg-probe");
878}
879
880static bool needsWinCFI(const MachineFunction &MF) {
881 const Function &F = MF.getFunction();
882 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
883 F.needsUnwindTableEntry();
884}
885
886bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
887 MachineFunction &MF, uint64_t StackBumpBytes) const {
889 const MachineFrameInfo &MFI = MF.getFrameInfo();
890 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
891 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
892 if (homogeneousPrologEpilog(MF))
893 return false;
894
895 if (AFI->getLocalStackSize() == 0)
896 return false;
897
898 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
899 // (to force a stp with predecrement) to match the packed unwind format,
900 // provided that there actually are any callee saved registers to merge the
901 // decrement with.
902 // This is potentially marginally slower, but allows using the packed
903 // unwind format for functions that both have a local area and callee saved
904 // registers. Using the packed unwind format notably reduces the size of
905 // the unwind info.
906 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
907 MF.getFunction().hasOptSize())
908 return false;
909
910 // 512 is the maximum immediate for stp/ldp that will be used for
911 // callee-save save/restores
912 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
913 return false;
914
915 if (MFI.hasVarSizedObjects())
916 return false;
917
918 if (RegInfo->hasStackRealignment(MF))
919 return false;
920
921 // This isn't strictly necessary, but it simplifies things a bit since the
922 // current RedZone handling code assumes the SP is adjusted by the
923 // callee-save save/restore code.
924 if (canUseRedZone(MF))
925 return false;
926
927 // When there is an SVE area on the stack, always allocate the
928 // callee-saves and spills/locals separately.
929 if (getSVEStackSize(MF))
930 return false;
931
932 return true;
933}
934
935bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
936 MachineBasicBlock &MBB, unsigned StackBumpBytes) const {
937 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
938 return false;
939
940 if (MBB.empty())
941 return true;
942
943 // Disable combined SP bump if the last instruction is an MTE tag store. It
944 // is almost always better to merge SP adjustment into those instructions.
947 while (LastI != Begin) {
948 --LastI;
949 if (LastI->isTransient())
950 continue;
951 if (!LastI->getFlag(MachineInstr::FrameDestroy))
952 break;
953 }
954 switch (LastI->getOpcode()) {
955 case AArch64::STGloop:
956 case AArch64::STZGloop:
957 case AArch64::STGi:
958 case AArch64::STZGi:
959 case AArch64::ST2Gi:
960 case AArch64::STZ2Gi:
961 return false;
962 default:
963 return true;
964 }
965 llvm_unreachable("unreachable");
966}
967
968// Given a load or a store instruction, generate an appropriate unwinding SEH
969// code on Windows.
971 const TargetInstrInfo &TII,
973 unsigned Opc = MBBI->getOpcode();
975 MachineFunction &MF = *MBB->getParent();
976 DebugLoc DL = MBBI->getDebugLoc();
977 unsigned ImmIdx = MBBI->getNumOperands() - 1;
978 int Imm = MBBI->getOperand(ImmIdx).getImm();
980 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
981 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
982
983 switch (Opc) {
984 default:
985 llvm_unreachable("No SEH Opcode for this instruction");
986 case AArch64::LDPDpost:
987 Imm = -Imm;
988 [[fallthrough]];
989 case AArch64::STPDpre: {
990 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
991 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
992 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
993 .addImm(Reg0)
994 .addImm(Reg1)
995 .addImm(Imm * 8)
996 .setMIFlag(Flag);
997 break;
998 }
999 case AArch64::LDPXpost:
1000 Imm = -Imm;
1001 [[fallthrough]];
1002 case AArch64::STPXpre: {
1003 Register Reg0 = MBBI->getOperand(1).getReg();
1004 Register Reg1 = MBBI->getOperand(2).getReg();
1005 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1006 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1007 .addImm(Imm * 8)
1008 .setMIFlag(Flag);
1009 else
1010 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1011 .addImm(RegInfo->getSEHRegNum(Reg0))
1012 .addImm(RegInfo->getSEHRegNum(Reg1))
1013 .addImm(Imm * 8)
1014 .setMIFlag(Flag);
1015 break;
1016 }
1017 case AArch64::LDRDpost:
1018 Imm = -Imm;
1019 [[fallthrough]];
1020 case AArch64::STRDpre: {
1021 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1022 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1023 .addImm(Reg)
1024 .addImm(Imm)
1025 .setMIFlag(Flag);
1026 break;
1027 }
1028 case AArch64::LDRXpost:
1029 Imm = -Imm;
1030 [[fallthrough]];
1031 case AArch64::STRXpre: {
1032 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1033 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1034 .addImm(Reg)
1035 .addImm(Imm)
1036 .setMIFlag(Flag);
1037 break;
1038 }
1039 case AArch64::STPDi:
1040 case AArch64::LDPDi: {
1041 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1042 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1043 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1044 .addImm(Reg0)
1045 .addImm(Reg1)
1046 .addImm(Imm * 8)
1047 .setMIFlag(Flag);
1048 break;
1049 }
1050 case AArch64::STPXi:
1051 case AArch64::LDPXi: {
1052 Register Reg0 = MBBI->getOperand(0).getReg();
1053 Register Reg1 = MBBI->getOperand(1).getReg();
1054 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1055 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1056 .addImm(Imm * 8)
1057 .setMIFlag(Flag);
1058 else
1059 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1060 .addImm(RegInfo->getSEHRegNum(Reg0))
1061 .addImm(RegInfo->getSEHRegNum(Reg1))
1062 .addImm(Imm * 8)
1063 .setMIFlag(Flag);
1064 break;
1065 }
1066 case AArch64::STRXui:
1067 case AArch64::LDRXui: {
1068 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1069 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1070 .addImm(Reg)
1071 .addImm(Imm * 8)
1072 .setMIFlag(Flag);
1073 break;
1074 }
1075 case AArch64::STRDui:
1076 case AArch64::LDRDui: {
1077 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1078 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1079 .addImm(Reg)
1080 .addImm(Imm * 8)
1081 .setMIFlag(Flag);
1082 break;
1083 }
1084 }
1085 auto I = MBB->insertAfter(MBBI, MIB);
1086 return I;
1087}
1088
1089// Fix up the SEH opcode associated with the save/restore instruction.
1091 unsigned LocalStackSize) {
1092 MachineOperand *ImmOpnd = nullptr;
1093 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1094 switch (MBBI->getOpcode()) {
1095 default:
1096 llvm_unreachable("Fix the offset in the SEH instruction");
1097 case AArch64::SEH_SaveFPLR:
1098 case AArch64::SEH_SaveRegP:
1099 case AArch64::SEH_SaveReg:
1100 case AArch64::SEH_SaveFRegP:
1101 case AArch64::SEH_SaveFReg:
1102 ImmOpnd = &MBBI->getOperand(ImmIdx);
1103 break;
1104 }
1105 if (ImmOpnd)
1106 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1107}
1108
1109// Convert callee-save register save/restore instruction to do stack pointer
1110// decrement/increment to allocate/deallocate the callee-save stack area by
1111// converting store/load to use pre/post increment version.
1114 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1115 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1117 int CFAOffset = 0) {
1118 unsigned NewOpc;
1119 switch (MBBI->getOpcode()) {
1120 default:
1121 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1122 case AArch64::STPXi:
1123 NewOpc = AArch64::STPXpre;
1124 break;
1125 case AArch64::STPDi:
1126 NewOpc = AArch64::STPDpre;
1127 break;
1128 case AArch64::STPQi:
1129 NewOpc = AArch64::STPQpre;
1130 break;
1131 case AArch64::STRXui:
1132 NewOpc = AArch64::STRXpre;
1133 break;
1134 case AArch64::STRDui:
1135 NewOpc = AArch64::STRDpre;
1136 break;
1137 case AArch64::STRQui:
1138 NewOpc = AArch64::STRQpre;
1139 break;
1140 case AArch64::LDPXi:
1141 NewOpc = AArch64::LDPXpost;
1142 break;
1143 case AArch64::LDPDi:
1144 NewOpc = AArch64::LDPDpost;
1145 break;
1146 case AArch64::LDPQi:
1147 NewOpc = AArch64::LDPQpost;
1148 break;
1149 case AArch64::LDRXui:
1150 NewOpc = AArch64::LDRXpost;
1151 break;
1152 case AArch64::LDRDui:
1153 NewOpc = AArch64::LDRDpost;
1154 break;
1155 case AArch64::LDRQui:
1156 NewOpc = AArch64::LDRQpost;
1157 break;
1158 }
1159 // Get rid of the SEH code associated with the old instruction.
1160 if (NeedsWinCFI) {
1161 auto SEH = std::next(MBBI);
1163 SEH->eraseFromParent();
1164 }
1165
1166 TypeSize Scale = TypeSize::Fixed(1);
1167 unsigned Width;
1168 int64_t MinOffset, MaxOffset;
1169 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1170 NewOpc, Scale, Width, MinOffset, MaxOffset);
1171 (void)Success;
1172 assert(Success && "unknown load/store opcode");
1173
1174 // If the first store isn't right where we want SP then we can't fold the
1175 // update in so create a normal arithmetic instruction instead.
1176 MachineFunction &MF = *MBB.getParent();
1177 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1178 CSStackSizeInc < MinOffset || CSStackSizeInc > MaxOffset) {
1179 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1180 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1181 false, false, nullptr, EmitCFI,
1182 StackOffset::getFixed(CFAOffset));
1183
1184 return std::prev(MBBI);
1185 }
1186
1187 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1188 MIB.addReg(AArch64::SP, RegState::Define);
1189
1190 // Copy all operands other than the immediate offset.
1191 unsigned OpndIdx = 0;
1192 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1193 ++OpndIdx)
1194 MIB.add(MBBI->getOperand(OpndIdx));
1195
1196 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1197 "Unexpected immediate offset in first/last callee-save save/restore "
1198 "instruction!");
1199 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1200 "Unexpected base register in callee-save save/restore instruction!");
1201 assert(CSStackSizeInc % Scale == 0);
1202 MIB.addImm(CSStackSizeInc / (int)Scale);
1203
1204 MIB.setMIFlags(MBBI->getFlags());
1205 MIB.setMemRefs(MBBI->memoperands());
1206
1207 // Generate a new SEH code that corresponds to the new instruction.
1208 if (NeedsWinCFI) {
1209 *HasWinCFI = true;
1210 InsertSEH(*MIB, *TII, FrameFlag);
1211 }
1212
1213 if (EmitCFI) {
1214 unsigned CFIIndex = MF.addFrameInst(
1215 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1216 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1217 .addCFIIndex(CFIIndex)
1218 .setMIFlags(FrameFlag);
1219 }
1220
1221 return std::prev(MBB.erase(MBBI));
1222}
1223
1224// Fixup callee-save register save/restore instructions to take into account
1225// combined SP bump by adding the local stack size to the stack offsets.
1227 uint64_t LocalStackSize,
1228 bool NeedsWinCFI,
1229 bool *HasWinCFI) {
1231 return;
1232
1233 unsigned Opc = MI.getOpcode();
1234 unsigned Scale;
1235 switch (Opc) {
1236 case AArch64::STPXi:
1237 case AArch64::STRXui:
1238 case AArch64::STPDi:
1239 case AArch64::STRDui:
1240 case AArch64::LDPXi:
1241 case AArch64::LDRXui:
1242 case AArch64::LDPDi:
1243 case AArch64::LDRDui:
1244 Scale = 8;
1245 break;
1246 case AArch64::STPQi:
1247 case AArch64::STRQui:
1248 case AArch64::LDPQi:
1249 case AArch64::LDRQui:
1250 Scale = 16;
1251 break;
1252 default:
1253 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1254 }
1255
1256 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1257 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1258 "Unexpected base register in callee-save save/restore instruction!");
1259 // Last operand is immediate offset that needs fixing.
1260 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1261 // All generated opcodes have scaled offsets.
1262 assert(LocalStackSize % Scale == 0);
1263 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1264
1265 if (NeedsWinCFI) {
1266 *HasWinCFI = true;
1267 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1268 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1270 "Expecting a SEH instruction");
1271 fixupSEHOpcode(MBBI, LocalStackSize);
1272 }
1273}
1274
1275static bool isTargetWindows(const MachineFunction &MF) {
1277}
1278
1279// Convenience function to determine whether I is an SVE callee save.
1281 switch (I->getOpcode()) {
1282 default:
1283 return false;
1284 case AArch64::STR_ZXI:
1285 case AArch64::STR_PXI:
1286 case AArch64::LDR_ZXI:
1287 case AArch64::LDR_PXI:
1288 return I->getFlag(MachineInstr::FrameSetup) ||
1289 I->getFlag(MachineInstr::FrameDestroy);
1290 }
1291}
1292
1294 if (!(llvm::any_of(
1296 [](const auto &Info) { return Info.getReg() == AArch64::LR; }) &&
1297 MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack)))
1298 return false;
1299
1301 report_fatal_error("Must reserve x18 to use shadow call stack");
1302
1303 return true;
1304}
1305
1307 MachineFunction &MF,
1310 const DebugLoc &DL, bool NeedsWinCFI,
1311 bool NeedsUnwindInfo) {
1312 // Shadow call stack prolog: str x30, [x18], #8
1313 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1314 .addReg(AArch64::X18, RegState::Define)
1315 .addReg(AArch64::LR)
1316 .addReg(AArch64::X18)
1317 .addImm(8)
1319
1320 // This instruction also makes x18 live-in to the entry block.
1321 MBB.addLiveIn(AArch64::X18);
1322
1323 if (NeedsWinCFI)
1324 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1326
1327 if (NeedsUnwindInfo) {
1328 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1329 // x18 when unwinding past this frame.
1330 static const char CFIInst[] = {
1331 dwarf::DW_CFA_val_expression,
1332 18, // register
1333 2, // length
1334 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1335 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1336 };
1337 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1338 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1339 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1340 .addCFIIndex(CFIIndex)
1342 }
1343}
1344
1346 MachineFunction &MF,
1349 const DebugLoc &DL) {
1350 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1351 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1352 .addReg(AArch64::X18, RegState::Define)
1353 .addReg(AArch64::LR, RegState::Define)
1354 .addReg(AArch64::X18)
1355 .addImm(-8)
1357
1359 unsigned CFIIndex =
1361 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1362 .addCFIIndex(CFIIndex)
1364 }
1365}
1366
1368 MachineBasicBlock &MBB) const {
1370 const MachineFrameInfo &MFI = MF.getFrameInfo();
1371 const Function &F = MF.getFunction();
1372 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1373 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1374 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1375 MachineModuleInfo &MMI = MF.getMMI();
1377 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1378 bool HasFP = hasFP(MF);
1379 bool NeedsWinCFI = needsWinCFI(MF);
1380 bool HasWinCFI = false;
1381 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1382
1383 bool IsFunclet = MBB.isEHFuncletEntry();
1384
1385 // At this point, we're going to decide whether or not the function uses a
1386 // redzone. In most cases, the function doesn't have a redzone so let's
1387 // assume that's false and set it to true in the case that there's a redzone.
1388 AFI->setHasRedZone(false);
1389
1390 // Debug location must be unknown since the first debug location is used
1391 // to determine the end of the prologue.
1392 DebugLoc DL;
1393
1394 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1396 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1397 MFnI.needsDwarfUnwindInfo(MF));
1398
1399 if (MFnI.shouldSignReturnAddress(MF)) {
1400 if (MFnI.shouldSignWithBKey()) {
1401 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITBKEY))
1403 }
1404
1405 // No SEH opcode for this one; it doesn't materialize into an
1406 // instruction on Windows.
1407 BuildMI(MBB, MBBI, DL,
1408 TII->get(MFnI.shouldSignWithBKey() ? AArch64::PACIBSP
1409 : AArch64::PACIASP))
1411
1412 if (EmitCFI) {
1413 unsigned CFIIndex =
1415 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1416 .addCFIIndex(CFIIndex)
1418 } else if (NeedsWinCFI) {
1419 HasWinCFI = true;
1420 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PACSignLR))
1422 }
1423 }
1424 if (EmitCFI && MFnI.isMTETagged()) {
1425 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1427 }
1428
1429 // We signal the presence of a Swift extended frame to external tools by
1430 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1431 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1432 // bits so that is still true.
1433 if (HasFP && AFI->hasSwiftAsyncContext()) {
1436 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1437 // The special symbol below is absolute and has a *value* that can be
1438 // combined with the frame pointer to signal an extended frame.
1439 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1440 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1442 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1443 .addUse(AArch64::FP)
1444 .addUse(AArch64::X16)
1445 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1446 break;
1447 }
1448 [[fallthrough]];
1449
1451 // ORR x29, x29, #0x1000_0000_0000_0000
1452 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1453 .addUse(AArch64::FP)
1454 .addImm(0x1100)
1456 break;
1457
1459 break;
1460 }
1461 }
1462
1463 // All calls are tail calls in GHC calling conv, and functions have no
1464 // prologue/epilogue.
1466 return;
1467
1468 // Set tagged base pointer to the requested stack slot.
1469 // Ideally it should match SP value after prologue.
1470 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1471 if (TBPI)
1473 else
1475
1476 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1477
1478 // getStackSize() includes all the locals in its size calculation. We don't
1479 // include these locals when computing the stack size of a funclet, as they
1480 // are allocated in the parent's stack frame and accessed via the frame
1481 // pointer from the funclet. We only save the callee saved registers in the
1482 // funclet, which are really the callee saved registers of the parent
1483 // function, including the funclet.
1484 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
1485 : MFI.getStackSize();
1486 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1487 assert(!HasFP && "unexpected function without stack frame but with FP");
1488 assert(!SVEStackSize &&
1489 "unexpected function without stack frame but with SVE objects");
1490 // All of the stack allocation is for locals.
1491 AFI->setLocalStackSize(NumBytes);
1492 if (!NumBytes)
1493 return;
1494 // REDZONE: If the stack size is less than 128 bytes, we don't need
1495 // to actually allocate.
1496 if (canUseRedZone(MF)) {
1497 AFI->setHasRedZone(true);
1498 ++NumRedZoneFunctions;
1499 } else {
1500 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1501 StackOffset::getFixed(-NumBytes), TII,
1502 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1503 if (EmitCFI) {
1504 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1505 MCSymbol *FrameLabel = MMI.getContext().createTempSymbol();
1506 // Encode the stack size of the leaf function.
1507 unsigned CFIIndex = MF.addFrameInst(
1508 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1509 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1510 .addCFIIndex(CFIIndex)
1512 }
1513 }
1514
1515 if (NeedsWinCFI) {
1516 HasWinCFI = true;
1517 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1519 }
1520
1521 return;
1522 }
1523
1524 bool IsWin64 =
1526 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1527
1528 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1529 // All of the remaining stack allocations are for locals.
1530 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1531 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1532 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1533 if (CombineSPBump) {
1534 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1535 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1536 StackOffset::getFixed(-NumBytes), TII,
1537 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1538 EmitCFI);
1539 NumBytes = 0;
1540 } else if (HomPrologEpilog) {
1541 // Stack has been already adjusted.
1542 NumBytes -= PrologueSaveSize;
1543 } else if (PrologueSaveSize != 0) {
1545 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1546 EmitCFI);
1547 NumBytes -= PrologueSaveSize;
1548 }
1549 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1550
1551 // Move past the saves of the callee-saved registers, fixing up the offsets
1552 // and pre-inc if we decided to combine the callee-save and local stack
1553 // pointer bump above.
1555 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1557 if (CombineSPBump)
1559 NeedsWinCFI, &HasWinCFI);
1560 ++MBBI;
1561 }
1562
1563 // For funclets the FP belongs to the containing function.
1564 if (!IsFunclet && HasFP) {
1565 // Only set up FP if we actually need to.
1566 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1567
1568 if (CombineSPBump)
1569 FPOffset += AFI->getLocalStackSize();
1570
1571 if (AFI->hasSwiftAsyncContext()) {
1572 // Before we update the live FP we have to ensure there's a valid (or
1573 // null) asynchronous context in its slot just before FP in the frame
1574 // record, so store it now.
1575 const auto &Attrs = MF.getFunction().getAttributes();
1576 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1577 if (HaveInitialContext)
1578 MBB.addLiveIn(AArch64::X22);
1579 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1580 .addUse(HaveInitialContext ? AArch64::X22 : AArch64::XZR)
1581 .addUse(AArch64::SP)
1582 .addImm(FPOffset - 8)
1584 }
1585
1586 if (HomPrologEpilog) {
1587 auto Prolog = MBBI;
1588 --Prolog;
1589 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
1590 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
1591 } else {
1592 // Issue sub fp, sp, FPOffset or
1593 // mov fp,sp when FPOffset is zero.
1594 // Note: All stores of callee-saved registers are marked as "FrameSetup".
1595 // This code marks the instruction(s) that set the FP also.
1596 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
1597 StackOffset::getFixed(FPOffset), TII,
1598 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1599 if (NeedsWinCFI && HasWinCFI) {
1600 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1602 // After setting up the FP, the rest of the prolog doesn't need to be
1603 // included in the SEH unwind info.
1604 NeedsWinCFI = false;
1605 }
1606 }
1607 if (EmitCFI) {
1608 // Define the current CFA rule to use the provided FP.
1609 const int OffsetToFirstCalleeSaveFromFP =
1612 Register FramePtr = RegInfo->getFrameRegister(MF);
1613 unsigned Reg = RegInfo->getDwarfRegNum(FramePtr, true);
1614 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1615 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1616 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1617 .addCFIIndex(CFIIndex)
1619 }
1620 }
1621
1622 // Now emit the moves for whatever callee saved regs we have (including FP,
1623 // LR if those are saved). Frame instructions for SVE register are emitted
1624 // later, after the instruction which actually save SVE regs.
1625 if (EmitCFI)
1626 emitCalleeSavedGPRLocations(MBB, MBBI);
1627
1628 // Alignment is required for the parent frame, not the funclet
1629 const bool NeedsRealignment =
1630 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
1631 int64_t RealignmentPadding =
1632 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
1633 ? MFI.getMaxAlign().value() - 16
1634 : 0;
1635
1636 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
1637 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
1638 if (NeedsWinCFI) {
1639 HasWinCFI = true;
1640 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
1641 // exceed this amount. We need to move at most 2^24 - 1 into x15.
1642 // This is at most two instructions, MOVZ follwed by MOVK.
1643 // TODO: Fix to use multiple stack alloc unwind codes for stacks
1644 // exceeding 256MB in size.
1645 if (NumBytes >= (1 << 28))
1646 report_fatal_error("Stack size cannot exceed 256MB for stack "
1647 "unwinding purposes");
1648
1649 uint32_t LowNumWords = NumWords & 0xFFFF;
1650 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
1651 .addImm(LowNumWords)
1654 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1656 if ((NumWords & 0xFFFF0000) != 0) {
1657 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
1658 .addReg(AArch64::X15)
1659 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
1662 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1664 }
1665 } else {
1666 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
1667 .addImm(NumWords)
1669 }
1670
1671 const char* ChkStk = Subtarget.getChkStkName();
1672 switch (MF.getTarget().getCodeModel()) {
1673 case CodeModel::Tiny:
1674 case CodeModel::Small:
1675 case CodeModel::Medium:
1676 case CodeModel::Kernel:
1677 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
1678 .addExternalSymbol(ChkStk)
1679 .addReg(AArch64::X15, RegState::Implicit)
1684 if (NeedsWinCFI) {
1685 HasWinCFI = true;
1686 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1688 }
1689 break;
1690 case CodeModel::Large:
1691 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
1692 .addReg(AArch64::X16, RegState::Define)
1693 .addExternalSymbol(ChkStk)
1694 .addExternalSymbol(ChkStk)
1696 if (NeedsWinCFI) {
1697 HasWinCFI = true;
1698 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1700 }
1701
1702 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
1703 .addReg(AArch64::X16, RegState::Kill)
1709 if (NeedsWinCFI) {
1710 HasWinCFI = true;
1711 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1713 }
1714 break;
1715 }
1716
1717 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
1718 .addReg(AArch64::SP, RegState::Kill)
1719 .addReg(AArch64::X15, RegState::Kill)
1722 if (NeedsWinCFI) {
1723 HasWinCFI = true;
1724 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
1725 .addImm(NumBytes)
1727 }
1728 NumBytes = 0;
1729
1730 if (RealignmentPadding > 0) {
1731 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
1732 .addReg(AArch64::SP)
1733 .addImm(RealignmentPadding)
1734 .addImm(0);
1735
1736 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
1737 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
1738 .addReg(AArch64::X15, RegState::Kill)
1740 AFI->setStackRealigned(true);
1741
1742 // No need for SEH instructions here; if we're realigning the stack,
1743 // we've set a frame pointer and already finished the SEH prologue.
1744 assert(!NeedsWinCFI);
1745 }
1746 }
1747
1748 StackOffset AllocateBefore = SVEStackSize, AllocateAfter = {};
1749 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
1750
1751 // Process the SVE callee-saves to determine what space needs to be
1752 // allocated.
1753 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
1754 // Find callee save instructions in frame.
1755 CalleeSavesBegin = MBBI;
1756 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
1758 ++MBBI;
1759 CalleeSavesEnd = MBBI;
1760
1761 AllocateBefore = StackOffset::getScalable(CalleeSavedSize);
1762 AllocateAfter = SVEStackSize - AllocateBefore;
1763 }
1764
1765 // Allocate space for the callee saves (if any).
1767 MBB, CalleeSavesBegin, DL, AArch64::SP, AArch64::SP, -AllocateBefore, TII,
1768 MachineInstr::FrameSetup, false, false, nullptr,
1769 EmitCFI && !HasFP && AllocateBefore,
1770 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes));
1771
1772 if (EmitCFI)
1773 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
1774
1775 // Finally allocate remaining SVE stack space.
1776 emitFrameOffset(MBB, CalleeSavesEnd, DL, AArch64::SP, AArch64::SP,
1777 -AllocateAfter, TII, MachineInstr::FrameSetup, false, false,
1778 nullptr, EmitCFI && !HasFP && AllocateAfter,
1779 AllocateBefore + StackOffset::getFixed(
1780 (int64_t)MFI.getStackSize() - NumBytes));
1781
1782 // Allocate space for the rest of the frame.
1783 if (NumBytes) {
1784 unsigned scratchSPReg = AArch64::SP;
1785
1786 if (NeedsRealignment) {
1787 scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);
1788 assert(scratchSPReg != AArch64::NoRegister);
1789 }
1790
1791 // If we're a leaf function, try using the red zone.
1792 if (!canUseRedZone(MF)) {
1793 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
1794 // the correct value here, as NumBytes also includes padding bytes,
1795 // which shouldn't be counted here.
1797 MBB, MBBI, DL, scratchSPReg, AArch64::SP,
1799 false, NeedsWinCFI, &HasWinCFI, EmitCFI && !HasFP,
1800 SVEStackSize +
1801 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes));
1802 }
1803 if (NeedsRealignment) {
1804 assert(MFI.getMaxAlign() > Align(1));
1805 assert(scratchSPReg != AArch64::SP);
1806
1807 // SUB X9, SP, NumBytes
1808 // -- X9 is temporary register, so shouldn't contain any live data here,
1809 // -- free to use. This is already produced by emitFrameOffset above.
1810 // AND SP, X9, 0b11111...0000
1811 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
1812
1813 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
1814 .addReg(scratchSPReg, RegState::Kill)
1816 AFI->setStackRealigned(true);
1817
1818 // No need for SEH instructions here; if we're realigning the stack,
1819 // we've set a frame pointer and already finished the SEH prologue.
1820 assert(!NeedsWinCFI);
1821 }
1822 }
1823
1824 // If we need a base pointer, set it up here. It's whatever the value of the
1825 // stack pointer is at this point. Any variable size objects will be allocated
1826 // after this, so we can still use the base pointer to reference locals.
1827 //
1828 // FIXME: Clarify FrameSetup flags here.
1829 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
1830 // needed.
1831 // For funclets the BP belongs to the containing function.
1832 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
1833 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
1834 false);
1835 if (NeedsWinCFI) {
1836 HasWinCFI = true;
1837 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1839 }
1840 }
1841
1842 // The very last FrameSetup instruction indicates the end of prologue. Emit a
1843 // SEH opcode indicating the prologue end.
1844 if (NeedsWinCFI && HasWinCFI) {
1845 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1847 }
1848
1849 // SEH funclets are passed the frame pointer in X1. If the parent
1850 // function uses the base register, then the base register is used
1851 // directly, and is not retrieved from X1.
1852 if (IsFunclet && F.hasPersonalityFn()) {
1853 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
1854 if (isAsynchronousEHPersonality(Per)) {
1855 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
1856 .addReg(AArch64::X1)
1858 MBB.addLiveIn(AArch64::X1);
1859 }
1860 }
1861}
1862
1864 bool NeedsWinCFI, bool *HasWinCFI) {
1865 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
1866 if (!MFI.shouldSignReturnAddress(MF))
1867 return;
1868 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1869 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1870
1872 DebugLoc DL;
1873 if (MBBI != MBB.end())
1874 DL = MBBI->getDebugLoc();
1875
1876 // The AUTIASP instruction assembles to a hint instruction before v8.3a so
1877 // this instruction can safely used for any v8a architecture.
1878 // From v8.3a onwards there are optimised authenticate LR and return
1879 // instructions, namely RETA{A,B}, that can be used instead. In this case the
1880 // DW_CFA_AARCH64_negate_ra_state can't be emitted.
1881 if (Subtarget.hasPAuth() &&
1882 !MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack) &&
1883 MBBI != MBB.end() && MBBI->getOpcode() == AArch64::RET_ReallyLR &&
1884 !NeedsWinCFI) {
1885 BuildMI(MBB, MBBI, DL,
1886 TII->get(MFI.shouldSignWithBKey() ? AArch64::RETAB : AArch64::RETAA))
1888 MBB.erase(MBBI);
1889 } else {
1890 BuildMI(
1891 MBB, MBBI, DL,
1892 TII->get(MFI.shouldSignWithBKey() ? AArch64::AUTIBSP : AArch64::AUTIASP))
1894
1895 unsigned CFIIndex =
1897 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1898 .addCFIIndex(CFIIndex)
1900 if (NeedsWinCFI) {
1901 *HasWinCFI = true;
1902 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PACSignLR))
1904 }
1905 }
1906}
1907
1909 switch (MI.getOpcode()) {
1910 default:
1911 return false;
1912 case AArch64::CATCHRET:
1913 case AArch64::CLEANUPRET:
1914 return true;
1915 }
1916}
1917
1919 MachineBasicBlock &MBB) const {
1921 MachineFrameInfo &MFI = MF.getFrameInfo();
1922 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1923 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1924 DebugLoc DL;
1925 bool NeedsWinCFI = needsWinCFI(MF);
1926 bool EmitCFI =
1927 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1928 bool HasWinCFI = false;
1929 bool IsFunclet = false;
1930 auto WinCFI = make_scope_exit([&]() { assert(HasWinCFI == MF.hasWinCFI()); });
1931
1932 if (MBB.end() != MBBI) {
1933 DL = MBBI->getDebugLoc();
1934 IsFunclet = isFuncletReturnInstr(*MBBI);
1935 }
1936
1937 auto FinishingTouches = make_scope_exit([&]() {
1938 InsertReturnAddressAuth(MF, MBB, NeedsWinCFI, &HasWinCFI);
1941 if (EmitCFI)
1942 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
1943 if (HasWinCFI)
1945 TII->get(AArch64::SEH_EpilogEnd))
1947 });
1948
1949 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
1950 : MFI.getStackSize();
1952
1953 // All calls are tail calls in GHC calling conv, and functions have no
1954 // prologue/epilogue.
1956 return;
1957
1958 // How much of the stack used by incoming arguments this function is expected
1959 // to restore in this particular epilogue.
1960 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
1961 bool IsWin64 =
1963 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1964
1965 int64_t AfterCSRPopSize = ArgumentStackToRestore;
1966 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1967 // We cannot rely on the local stack size set in emitPrologue if the function
1968 // has funclets, as funclets have different local stack size requirements, and
1969 // the current value set in emitPrologue may be that of the containing
1970 // function.
1971 if (MF.hasEHFunclets())
1972 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1973 if (homogeneousPrologEpilog(MF, &MBB)) {
1974 assert(!NeedsWinCFI);
1975 auto LastPopI = MBB.getFirstTerminator();
1976 if (LastPopI != MBB.begin()) {
1977 auto HomogeneousEpilog = std::prev(LastPopI);
1978 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
1979 LastPopI = HomogeneousEpilog;
1980 }
1981
1982 // Adjust local stack
1983 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
1985 MachineInstr::FrameDestroy, false, NeedsWinCFI);
1986
1987 // SP has been already adjusted while restoring callee save regs.
1988 // We've bailed-out the case with adjusting SP for arguments.
1989 assert(AfterCSRPopSize == 0);
1990 return;
1991 }
1992 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
1993 // Assume we can't combine the last pop with the sp restore.
1994
1995 bool CombineAfterCSRBump = false;
1996 if (!CombineSPBump && PrologueSaveSize != 0) {
1998 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2000 Pop = std::prev(Pop);
2001 // Converting the last ldp to a post-index ldp is valid only if the last
2002 // ldp's offset is 0.
2003 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2004 // If the offset is 0 and the AfterCSR pop is not actually trying to
2005 // allocate more stack for arguments (in space that an untimely interrupt
2006 // may clobber), convert it to a post-index ldp.
2007 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2009 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2010 MachineInstr::FrameDestroy, PrologueSaveSize);
2011 } else {
2012 // If not, make sure to emit an add after the last ldp.
2013 // We're doing this by transfering the size to be restored from the
2014 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2015 // pops.
2016 AfterCSRPopSize += PrologueSaveSize;
2017 CombineAfterCSRBump = true;
2018 }
2019 }
2020
2021 // Move past the restores of the callee-saved registers.
2022 // If we plan on combining the sp bump of the local stack size and the callee
2023 // save stack size, we might need to adjust the CSR save and restore offsets.
2026 while (LastPopI != Begin) {
2027 --LastPopI;
2028 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2029 IsSVECalleeSave(LastPopI)) {
2030 ++LastPopI;
2031 break;
2032 } else if (CombineSPBump)
2034 NeedsWinCFI, &HasWinCFI);
2035 }
2036
2037 if (MF.hasWinCFI()) {
2038 // If the prologue didn't contain any SEH opcodes and didn't set the
2039 // MF.hasWinCFI() flag, assume the epilogue won't either, and skip the
2040 // EpilogStart - to avoid generating CFI for functions that don't need it.
2041 // (And as we didn't generate any prologue at all, it would be asymmetrical
2042 // to the epilogue.) By the end of the function, we assert that
2043 // HasWinCFI is equal to MF.hasWinCFI(), to verify this assumption.
2044 HasWinCFI = true;
2045 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2047 }
2048
2049 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2052 // Avoid the reload as it is GOT relative, and instead fall back to the
2053 // hardcoded value below. This allows a mismatch between the OS and
2054 // application without immediately terminating on the difference.
2055 [[fallthrough]];
2057 // We need to reset FP to its untagged state on return. Bit 60 is
2058 // currently used to show the presence of an extended frame.
2059
2060 // BIC x29, x29, #0x1000_0000_0000_0000
2061 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2062 AArch64::FP)
2063 .addUse(AArch64::FP)
2064 .addImm(0x10fe)
2066 break;
2067
2069 break;
2070 }
2071 }
2072
2073 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2074
2075 // If there is a single SP update, insert it before the ret and we're done.
2076 if (CombineSPBump) {
2077 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2078
2079 // When we are about to restore the CSRs, the CFA register is SP again.
2080 if (EmitCFI && hasFP(MF)) {
2081 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2082 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2083 unsigned CFIIndex =
2084 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2085 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2086 .addCFIIndex(CFIIndex)
2088 }
2089
2090 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2091 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2092 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2093 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2094 return;
2095 }
2096
2097 NumBytes -= PrologueSaveSize;
2098 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2099
2100 // Process the SVE callee-saves to determine what space needs to be
2101 // deallocated.
2102 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2103 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2104 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2105 RestoreBegin = std::prev(RestoreEnd);
2106 while (RestoreBegin != MBB.begin() &&
2107 IsSVECalleeSave(std::prev(RestoreBegin)))
2108 --RestoreBegin;
2109
2110 assert(IsSVECalleeSave(RestoreBegin) &&
2111 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2112
2113 StackOffset CalleeSavedSizeAsOffset =
2114 StackOffset::getScalable(CalleeSavedSize);
2115 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2116 DeallocateAfter = CalleeSavedSizeAsOffset;
2117 }
2118
2119 // Deallocate the SVE area.
2120 if (SVEStackSize) {
2121 // If we have stack realignment or variable sized objects on the stack,
2122 // restore the stack pointer from the frame pointer prior to SVE CSR
2123 // restoration.
2124 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2125 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2126 // Set SP to start of SVE callee-save area from which they can
2127 // be reloaded. The code below will deallocate the stack space
2128 // space by moving FP -> SP.
2129 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2130 StackOffset::getScalable(-CalleeSavedSize), TII,
2132 }
2133 } else {
2134 if (AFI->getSVECalleeSavedStackSize()) {
2135 // Deallocate the non-SVE locals first before we can deallocate (and
2136 // restore callee saves) from the SVE area.
2138 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2140 false, false, nullptr, EmitCFI && !hasFP(MF),
2141 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2142 NumBytes = 0;
2143 }
2144
2145 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2146 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2147 false, nullptr, EmitCFI && !hasFP(MF),
2148 SVEStackSize +
2149 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2150
2151 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2152 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2153 false, nullptr, EmitCFI && !hasFP(MF),
2154 DeallocateAfter +
2155 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2156 }
2157 if (EmitCFI)
2158 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2159 }
2160
2161 if (!hasFP(MF)) {
2162 bool RedZone = canUseRedZone(MF);
2163 // If this was a redzone leaf function, we don't need to restore the
2164 // stack pointer (but we may need to pop stack args for fastcc).
2165 if (RedZone && AfterCSRPopSize == 0)
2166 return;
2167
2168 // Pop the local variables off the stack. If there are no callee-saved
2169 // registers, it means we are actually positioned at the terminator and can
2170 // combine stack increment for the locals and the stack increment for
2171 // callee-popped arguments into (possibly) a single instruction and be done.
2172 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2173 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2174 if (NoCalleeSaveRestore)
2175 StackRestoreBytes += AfterCSRPopSize;
2176
2178 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2179 StackOffset::getFixed(StackRestoreBytes), TII,
2180 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2181 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2182
2183 // If we were able to combine the local stack pop with the argument pop,
2184 // then we're done.
2185 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2186 return;
2187 }
2188
2189 NumBytes = 0;
2190 }
2191
2192 // Restore the original stack pointer.
2193 // FIXME: Rather than doing the math here, we should instead just use
2194 // non-post-indexed loads for the restores if we aren't actually going to
2195 // be able to save any instructions.
2196 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2198 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2200 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI);
2201 } else if (NumBytes)
2202 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2203 StackOffset::getFixed(NumBytes), TII,
2204 MachineInstr::FrameDestroy, false, NeedsWinCFI);
2205
2206 // When we are about to restore the CSRs, the CFA register is SP again.
2207 if (EmitCFI && hasFP(MF)) {
2208 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2209 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2210 unsigned CFIIndex = MF.addFrameInst(
2211 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2212 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2213 .addCFIIndex(CFIIndex)
2215 }
2216
2217 // This must be placed after the callee-save restore code because that code
2218 // assumes the SP is at the same location as it was after the callee-save save
2219 // code in the prologue.
2220 if (AfterCSRPopSize) {
2221 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2222 "interrupt may have clobbered");
2223
2225 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2227 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2228 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2229 }
2230}
2231
2232/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2233/// debug info. It's the same as what we use for resolving the code-gen
2234/// references for now. FIXME: This can go wrong when references are
2235/// SP-relative and simple call frames aren't used.
2238 Register &FrameReg) const {
2240 MF, FI, FrameReg,
2241 /*PreferFP=*/
2242 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress),
2243 /*ForSimm=*/false);
2244}
2245
2248 int FI) const {
2250}
2251
2253 int64_t ObjectOffset) {
2254 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2255 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2256 bool IsWin64 =
2257 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2258 unsigned FixedObject =
2259 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2260 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2261 int64_t FPAdjust =
2262 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2263 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2264}
2265
2267 int64_t ObjectOffset) {
2268 const auto &MFI = MF.getFrameInfo();
2269 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2270}
2271
2272 // TODO: This function currently does not work for scalable vectors.
2274 int FI) const {
2275 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2277 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2278 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2279 ? getFPOffset(MF, ObjectOffset).getFixed()
2280 : getStackOffset(MF, ObjectOffset).getFixed();
2281}
2282
2284 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2285 bool ForSimm) const {
2286 const auto &MFI = MF.getFrameInfo();
2287 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2288 bool isFixed = MFI.isFixedObjectIndex(FI);
2289 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2290 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2291 PreferFP, ForSimm);
2292}
2293
2295 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2296 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2297 const auto &MFI = MF.getFrameInfo();
2298 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2300 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2301 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2302
2303 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2304 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2305 bool isCSR =
2306 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2307
2308 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2309
2310 // Use frame pointer to reference fixed objects. Use it for locals if
2311 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2312 // reliable as a base). Make sure useFPForScavengingIndex() does the
2313 // right thing for the emergency spill slot.
2314 bool UseFP = false;
2315 if (AFI->hasStackFrame() && !isSVE) {
2316 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2317 // there are scalable (SVE) objects in between the FP and the fixed-sized
2318 // objects.
2319 PreferFP &= !SVEStackSize;
2320
2321 // Note: Keeping the following as multiple 'if' statements rather than
2322 // merging to a single expression for readability.
2323 //
2324 // Argument access should always use the FP.
2325 if (isFixed) {
2326 UseFP = hasFP(MF);
2327 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2328 // References to the CSR area must use FP if we're re-aligning the stack
2329 // since the dynamically-sized alignment padding is between the SP/BP and
2330 // the CSR area.
2331 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2332 UseFP = true;
2333 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2334 // If the FPOffset is negative and we're producing a signed immediate, we
2335 // have to keep in mind that the available offset range for negative
2336 // offsets is smaller than for positive ones. If an offset is available
2337 // via the FP and the SP, use whichever is closest.
2338 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2339 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2340
2341 if (MFI.hasVarSizedObjects()) {
2342 // If we have variable sized objects, we can use either FP or BP, as the
2343 // SP offset is unknown. We can use the base pointer if we have one and
2344 // FP is not preferred. If not, we're stuck with using FP.
2345 bool CanUseBP = RegInfo->hasBasePointer(MF);
2346 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2347 UseFP = PreferFP;
2348 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2349 UseFP = true;
2350 // else we can use BP and FP, but the offset from FP won't fit.
2351 // That will make us scavenge registers which we can probably avoid by
2352 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2353 } else if (FPOffset >= 0) {
2354 // Use SP or FP, whichever gives us the best chance of the offset
2355 // being in range for direct access. If the FPOffset is positive,
2356 // that'll always be best, as the SP will be even further away.
2357 UseFP = true;
2358 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2359 // Funclets access the locals contained in the parent's stack frame
2360 // via the frame pointer, so we have to use the FP in the parent
2361 // function.
2362 (void) Subtarget;
2363 assert(
2364 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv()) &&
2365 "Funclets should only be present on Win64");
2366 UseFP = true;
2367 } else {
2368 // We have the choice between FP and (SP or BP).
2369 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2370 UseFP = true;
2371 }
2372 }
2373 }
2374
2375 assert(
2376 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2377 "In the presence of dynamic stack pointer realignment, "
2378 "non-argument/CSR objects cannot be accessed through the frame pointer");
2379
2380 if (isSVE) {
2381 StackOffset FPOffset =
2383 StackOffset SPOffset =
2384 SVEStackSize +
2385 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2386 ObjectOffset);
2387 // Always use the FP for SVE spills if available and beneficial.
2388 if (hasFP(MF) && (SPOffset.getFixed() ||
2389 FPOffset.getScalable() < SPOffset.getScalable() ||
2390 RegInfo->hasStackRealignment(MF))) {
2391 FrameReg = RegInfo->getFrameRegister(MF);
2392 return FPOffset;
2393 }
2394
2395 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2396 : (unsigned)AArch64::SP;
2397 return SPOffset;
2398 }
2399
2400 StackOffset ScalableOffset = {};
2401 if (UseFP && !(isFixed || isCSR))
2402 ScalableOffset = -SVEStackSize;
2403 if (!UseFP && (isFixed || isCSR))
2404 ScalableOffset = SVEStackSize;
2405
2406 if (UseFP) {
2407 FrameReg = RegInfo->getFrameRegister(MF);
2408 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2409 }
2410
2411 // Use the base pointer if we have one.
2412 if (RegInfo->hasBasePointer(MF))
2413 FrameReg = RegInfo->getBaseRegister();
2414 else {
2415 assert(!MFI.hasVarSizedObjects() &&
2416 "Can't use SP when we have var sized objects.");
2417 FrameReg = AArch64::SP;
2418 // If we're using the red zone for this function, the SP won't actually
2419 // be adjusted, so the offsets will be negative. They're also all
2420 // within range of the signed 9-bit immediate instructions.
2421 if (canUseRedZone(MF))
2422 Offset -= AFI->getLocalStackSize();
2423 }
2424
2425 return StackOffset::getFixed(Offset) + ScalableOffset;
2426}
2427
2428static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2429 // Do not set a kill flag on values that are also marked as live-in. This
2430 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2431 // callee saved registers.
2432 // Omitting the kill flags is conservatively correct even if the live-in
2433 // is not used after all.
2434 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2435 return getKillRegState(!IsLiveIn);
2436}
2437
2439 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2441 return Subtarget.isTargetMachO() &&
2442 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2443 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2445}
2446
2447static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2448 bool NeedsWinCFI, bool IsFirst,
2449 const TargetRegisterInfo *TRI) {
2450 // If we are generating register pairs for a Windows function that requires
2451 // EH support, then pair consecutive registers only. There are no unwind
2452 // opcodes for saves/restores of non-consectuve register pairs.
2453 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2454 // save_lrpair.
2455 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2456
2457 if (Reg2 == AArch64::FP)
2458 return true;
2459 if (!NeedsWinCFI)
2460 return false;
2461 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2462 return false;
2463 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2464 // opcode. If this is the first register pair, it would end up with a
2465 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2466 // if LR is paired with something else than the first register.
2467 // The save_lrpair opcode requires the first register to be an odd one.
2468 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2469 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2470 return false;
2471 return true;
2472}
2473
2474/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2475/// WindowsCFI requires that only consecutive registers can be paired.
2476/// LR and FP need to be allocated together when the frame needs to save
2477/// the frame-record. This means any other register pairing with LR is invalid.
2478static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2479 bool UsesWinAAPCS, bool NeedsWinCFI,
2480 bool NeedsFrameRecord, bool IsFirst,
2481 const TargetRegisterInfo *TRI) {
2482 if (UsesWinAAPCS)
2483 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2484 TRI);
2485
2486 // If we need to store the frame record, don't pair any register
2487 // with LR other than FP.
2488 if (NeedsFrameRecord)
2489 return Reg2 == AArch64::LR;
2490
2491 return false;
2492}
2493
2494namespace {
2495
2496struct RegPairInfo {
2497 unsigned Reg1 = AArch64::NoRegister;
2498 unsigned Reg2 = AArch64::NoRegister;
2499 int FrameIdx;
2500 int Offset;
2501 enum RegType { GPR, FPR64, FPR128, PPR, ZPR } Type;
2502
2503 RegPairInfo() = default;
2504
2505 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2506
2507 unsigned getScale() const {
2508 switch (Type) {
2509 case PPR:
2510 return 2;
2511 case GPR:
2512 case FPR64:
2513 return 8;
2514 case ZPR:
2515 case FPR128:
2516 return 16;
2517 }
2518 llvm_unreachable("Unsupported type");
2519 }
2520
2521 bool isScalable() const { return Type == PPR || Type == ZPR; }
2522};
2523
2524} // end anonymous namespace
2525
2529 bool NeedsFrameRecord) {
2530
2531 if (CSI.empty())
2532 return;
2533
2534 bool IsWindows = isTargetWindows(MF);
2535 bool NeedsWinCFI = needsWinCFI(MF);
2537 MachineFrameInfo &MFI = MF.getFrameInfo();
2539 unsigned Count = CSI.size();
2540 (void)CC;
2541 // MachO's compact unwind format relies on all registers being stored in
2542 // pairs.
2545 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2546 "Odd number of callee-saved regs to spill!");
2547 int ByteOffset = AFI->getCalleeSavedStackSize();
2548 int StackFillDir = -1;
2549 int RegInc = 1;
2550 unsigned FirstReg = 0;
2551 if (NeedsWinCFI) {
2552 // For WinCFI, fill the stack from the bottom up.
2553 ByteOffset = 0;
2554 StackFillDir = 1;
2555 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2556 // backwards, to pair up registers starting from lower numbered registers.
2557 RegInc = -1;
2558 FirstReg = Count - 1;
2559 }
2560 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
2561 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2562
2563 // When iterating backwards, the loop condition relies on unsigned wraparound.
2564 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2565 RegPairInfo RPI;
2566 RPI.Reg1 = CSI[i].getReg();
2567
2568 if (AArch64::GPR64RegClass.contains(RPI.Reg1))
2569 RPI.Type = RegPairInfo::GPR;
2570 else if (AArch64::FPR64RegClass.contains(RPI.Reg1))
2571 RPI.Type = RegPairInfo::FPR64;
2572 else if (AArch64::FPR128RegClass.contains(RPI.Reg1))
2573 RPI.Type = RegPairInfo::FPR128;
2574 else if (AArch64::ZPRRegClass.contains(RPI.Reg1))
2575 RPI.Type = RegPairInfo::ZPR;
2576 else if (AArch64::PPRRegClass.contains(RPI.Reg1))
2577 RPI.Type = RegPairInfo::PPR;
2578 else
2579 llvm_unreachable("Unsupported register class.");
2580
2581 // Add the next reg to the pair if it is in the same register class.
2582 if (unsigned(i + RegInc) < Count) {
2583 Register NextReg = CSI[i + RegInc].getReg();
2584 bool IsFirst = i == FirstReg;
2585 switch (RPI.Type) {
2586 case RegPairInfo::GPR:
2587 if (AArch64::GPR64RegClass.contains(NextReg) &&
2588 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
2589 NeedsWinCFI, NeedsFrameRecord, IsFirst,
2590 TRI))
2591 RPI.Reg2 = NextReg;
2592 break;
2593 case RegPairInfo::FPR64:
2594 if (AArch64::FPR64RegClass.contains(NextReg) &&
2595 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
2596 IsFirst, TRI))
2597 RPI.Reg2 = NextReg;
2598 break;
2599 case RegPairInfo::FPR128:
2600 if (AArch64::FPR128RegClass.contains(NextReg))
2601 RPI.Reg2 = NextReg;
2602 break;
2603 case RegPairInfo::PPR:
2604 case RegPairInfo::ZPR:
2605 break;
2606 }
2607 }
2608
2609 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
2610 // list to come in sorted by frame index so that we can issue the store
2611 // pair instructions directly. Assert if we see anything otherwise.
2612 //
2613 // The order of the registers in the list is controlled by
2614 // getCalleeSavedRegs(), so they will always be in-order, as well.
2615 assert((!RPI.isPaired() ||
2616 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
2617 "Out of order callee saved regs!");
2618
2619 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
2620 RPI.Reg1 == AArch64::LR) &&
2621 "FrameRecord must be allocated together with LR");
2622
2623 // Windows AAPCS has FP and LR reversed.
2624 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
2625 RPI.Reg2 == AArch64::LR) &&
2626 "FrameRecord must be allocated together with LR");
2627
2628 // MachO's compact unwind format relies on all registers being stored in
2629 // adjacent register pairs.
2633 (RPI.isPaired() &&
2634 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
2635 RPI.Reg1 + 1 == RPI.Reg2))) &&
2636 "Callee-save registers not saved as adjacent register pair!");
2637
2638 RPI.FrameIdx = CSI[i].getFrameIdx();
2639 if (NeedsWinCFI &&
2640 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
2641 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
2642
2643 int Scale = RPI.getScale();
2644
2645 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2646 assert(OffsetPre % Scale == 0);
2647
2648 if (RPI.isScalable())
2649 ScalableByteOffset += StackFillDir * Scale;
2650 else
2651 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
2652
2653 // Swift's async context is directly before FP, so allocate an extra
2654 // 8 bytes for it.
2655 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2656 RPI.Reg2 == AArch64::FP)
2657 ByteOffset += StackFillDir * 8;
2658
2659 assert(!(RPI.isScalable() && RPI.isPaired()) &&
2660 "Paired spill/fill instructions don't exist for SVE vectors");
2661
2662 // Round up size of non-pair to pair size if we need to pad the
2663 // callee-save area to ensure 16-byte alignment.
2664 if (NeedGapToAlignStack && !NeedsWinCFI &&
2665 !RPI.isScalable() && RPI.Type != RegPairInfo::FPR128 &&
2666 !RPI.isPaired() && ByteOffset % 16 != 0) {
2667 ByteOffset += 8 * StackFillDir;
2668 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
2669 // A stack frame with a gap looks like this, bottom up:
2670 // d9, d8. x21, gap, x20, x19.
2671 // Set extra alignment on the x21 object to create the gap above it.
2672 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
2673 NeedGapToAlignStack = false;
2674 }
2675
2676 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2677 assert(OffsetPost % Scale == 0);
2678 // If filling top down (default), we want the offset after incrementing it.
2679 // If fillibg bootom up (WinCFI) we need the original offset.
2680 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
2681
2682 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
2683 // Swift context can directly precede FP.
2684 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2685 RPI.Reg2 == AArch64::FP)
2686 Offset += 8;
2687 RPI.Offset = Offset / Scale;
2688
2689 assert(((!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
2690 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
2691 "Offset out of bounds for LDP/STP immediate");
2692
2693 // Save the offset to frame record so that the FP register can point to the
2694 // innermost frame record (spilled FP and LR registers).
2695 if (NeedsFrameRecord && ((!IsWindows && RPI.Reg1 == AArch64::LR &&
2696 RPI.Reg2 == AArch64::FP) ||
2697 (IsWindows && RPI.Reg1 == AArch64::FP &&
2698 RPI.Reg2 == AArch64::LR)))
2700
2701 RegPairs.push_back(RPI);
2702 if (RPI.isPaired())
2703 i += RegInc;
2704 }
2705 if (NeedsWinCFI) {
2706 // If we need an alignment gap in the stack, align the topmost stack
2707 // object. A stack frame with a gap looks like this, bottom up:
2708 // x19, d8. d9, gap.
2709 // Set extra alignment on the topmost stack object (the first element in
2710 // CSI, which goes top down), to create the gap above it.
2711 if (AFI->hasCalleeSaveStackFreeSpace())
2712 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
2713 // We iterated bottom up over the registers; flip RegPairs back to top
2714 // down order.
2715 std::reverse(RegPairs.begin(), RegPairs.end());
2716 }
2717}
2718
2722 MachineFunction &MF = *MBB.getParent();
2724 bool NeedsWinCFI = needsWinCFI(MF);
2725 DebugLoc DL;
2727
2728 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
2729
2730 const MachineRegisterInfo &MRI = MF.getRegInfo();
2731 if (homogeneousPrologEpilog(MF)) {
2732 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
2734
2735 for (auto &RPI : RegPairs) {
2736 MIB.addReg(RPI.Reg1);
2737 MIB.addReg(RPI.Reg2);
2738
2739 // Update register live in.
2740 if (!MRI.isReserved(RPI.Reg1))
2741 MBB.addLiveIn(RPI.Reg1);
2742 if (!MRI.isReserved(RPI.Reg2))
2743 MBB.addLiveIn(RPI.Reg2);
2744 }
2745 return true;
2746 }
2747 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
2748 unsigned Reg1 = RPI.Reg1;
2749 unsigned Reg2 = RPI.Reg2;
2750 unsigned StrOpc;
2751
2752 // Issue sequence of spills for cs regs. The first spill may be converted
2753 // to a pre-decrement store later by emitPrologue if the callee-save stack
2754 // area allocation can't be combined with the local stack area allocation.
2755 // For example:
2756 // stp x22, x21, [sp, #0] // addImm(+0)
2757 // stp x20, x19, [sp, #16] // addImm(+2)
2758 // stp fp, lr, [sp, #32] // addImm(+4)
2759 // Rationale: This sequence saves uop updates compared to a sequence of
2760 // pre-increment spills like stp xi,xj,[sp,#-16]!
2761 // Note: Similar rationale and sequence for restores in epilog.
2762 unsigned Size;
2763 Align Alignment;
2764 switch (RPI.Type) {
2765 case RegPairInfo::GPR:
2766 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2767 Size = 8;
2768 Alignment = Align(8);
2769 break;
2770 case RegPairInfo::FPR64:
2771 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2772 Size = 8;
2773 Alignment = Align(8);
2774 break;
2775 case RegPairInfo::FPR128:
2776 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2777 Size = 16;
2778 Alignment = Align(16);
2779 break;
2780 case RegPairInfo::ZPR:
2781 StrOpc = AArch64::STR_ZXI;
2782 Size = 16;
2783 Alignment = Align(16);
2784 break;
2785 case RegPairInfo::PPR:
2786 StrOpc = AArch64::STR_PXI;
2787 Size = 2;
2788 Alignment = Align(2);
2789 break;
2790 }
2791 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2792 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2793 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2794 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2795 dbgs() << ")\n");
2796
2797 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2798 "Windows unwdinding requires a consecutive (FP,LR) pair");
2799 // Windows unwind codes require consecutive registers if registers are
2800 // paired. Make the switch here, so that the code below will save (x,x+1)
2801 // and not (x+1,x).
2802 unsigned FrameIdxReg1 = RPI.FrameIdx;
2803 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2804 if (NeedsWinCFI && RPI.isPaired()) {
2805 std::swap(Reg1, Reg2);
2806 std::swap(FrameIdxReg1, FrameIdxReg2);
2807 }
2808 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2809 if (!MRI.isReserved(Reg1))
2810 MBB.addLiveIn(Reg1);
2811 if (RPI.isPaired()) {
2812 if (!MRI.isReserved(Reg2))
2813 MBB.addLiveIn(Reg2);
2814 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2816 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2817 MachineMemOperand::MOStore, Size, Alignment));
2818 }
2819 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2820 .addReg(AArch64::SP)
2821 .addImm(RPI.Offset) // [sp, #offset*scale],
2822 // where factor*scale is implicit
2825 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2826 MachineMemOperand::MOStore, Size, Alignment));
2827 if (NeedsWinCFI)
2829
2830 // Update the StackIDs of the SVE stack slots.
2831 MachineFrameInfo &MFI = MF.getFrameInfo();
2832 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR)
2833 MFI.setStackID(RPI.FrameIdx, TargetStackID::ScalableVector);
2834
2835 }
2836 return true;
2837}
2838
2842 MachineFunction &MF = *MBB.getParent();
2844 DebugLoc DL;
2846 bool NeedsWinCFI = needsWinCFI(MF);
2847
2848 if (MBBI != MBB.end())
2849 DL = MBBI->getDebugLoc();
2850
2851 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
2852
2853 auto EmitMI = [&](const RegPairInfo &RPI) -> MachineBasicBlock::iterator {
2854 unsigned Reg1 = RPI.Reg1;
2855 unsigned Reg2 = RPI.Reg2;
2856
2857 // Issue sequence of restores for cs regs. The last restore may be converted
2858 // to a post-increment load later by emitEpilogue if the callee-save stack
2859 // area allocation can't be combined with the local stack area allocation.
2860 // For example:
2861 // ldp fp, lr, [sp, #32] // addImm(+4)
2862 // ldp x20, x19, [sp, #16] // addImm(+2)
2863 // ldp x22, x21, [sp, #0] // addImm(+0)
2864 // Note: see comment in spillCalleeSavedRegisters()
2865 unsigned LdrOpc;
2866 unsigned Size;
2867 Align Alignment;
2868 switch (RPI.Type) {
2869 case RegPairInfo::GPR:
2870 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2871 Size = 8;
2872 Alignment = Align(8);
2873 break;
2874 case RegPairInfo::FPR64:
2875 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2876 Size = 8;
2877 Alignment = Align(8);
2878 break;
2879 case RegPairInfo::FPR128:
2880 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2881 Size = 16;
2882 Alignment = Align(16);
2883 break;
2884 case RegPairInfo::ZPR:
2885 LdrOpc = AArch64::LDR_ZXI;
2886 Size = 16;
2887 Alignment = Align(16);
2888 break;
2889 case RegPairInfo::PPR:
2890 LdrOpc = AArch64::LDR_PXI;
2891 Size = 2;
2892 Alignment = Align(2);
2893 break;
2894 }
2895 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2896 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2897 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2898 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2899 dbgs() << ")\n");
2900
2901 // Windows unwind codes require consecutive registers if registers are
2902 // paired. Make the switch here, so that the code below will save (x,x+1)
2903 // and not (x+1,x).
2904 unsigned FrameIdxReg1 = RPI.FrameIdx;
2905 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2906 if (NeedsWinCFI && RPI.isPaired()) {
2907 std::swap(Reg1, Reg2);
2908 std::swap(FrameIdxReg1, FrameIdxReg2);
2909 }
2910 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2911 if (RPI.isPaired()) {
2912 MIB.addReg(Reg2, getDefRegState(true));
2914 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2915 MachineMemOperand::MOLoad, Size, Alignment));
2916 }
2917 MIB.addReg(Reg1, getDefRegState(true))
2918 .addReg(AArch64::SP)
2919 .addImm(RPI.Offset) // [sp, #offset*scale]
2920 // where factor*scale is implicit
2923 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2924 MachineMemOperand::MOLoad, Size, Alignment));
2925 if (NeedsWinCFI)
2927
2928 return MIB->getIterator();
2929 };
2930
2931 // SVE objects are always restored in reverse order.
2932 for (const RegPairInfo &RPI : reverse(RegPairs))
2933 if (RPI.isScalable())
2934 EmitMI(RPI);
2935
2936 if (homogeneousPrologEpilog(MF, &MBB)) {
2937 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2939 for (auto &RPI : RegPairs) {
2940 MIB.addReg(RPI.Reg1, RegState::Define);
2941 MIB.addReg(RPI.Reg2, RegState::Define);
2942 }
2943 return true;
2944 }
2945
2948 for (const RegPairInfo &RPI : reverse(RegPairs)) {
2949 if (RPI.isScalable())
2950 continue;
2951 MachineBasicBlock::iterator It = EmitMI(RPI);
2952 if (First == MBB.end())
2953 First = It;
2954 }
2955 if (First != MBB.end())
2956 MBB.splice(MBBI, &MBB, First);
2957 } else {
2958 for (const RegPairInfo &RPI : RegPairs) {
2959 if (RPI.isScalable())
2960 continue;
2961 (void)EmitMI(RPI);
2962 }
2963 }
2964
2965 return true;
2966}
2967
2969 BitVector &SavedRegs,
2970 RegScavenger *RS) const {
2971 // All calls are tail calls in GHC calling conv, and functions have no
2972 // prologue/epilogue.
2974 return;
2975
2977 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
2979 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2981 unsigned UnspilledCSGPR = AArch64::NoRegister;
2982 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2983
2984 MachineFrameInfo &MFI = MF.getFrameInfo();
2985 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2986
2987 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
2988 ? RegInfo->getBaseRegister()
2989 : (unsigned)AArch64::NoRegister;
2990
2991 unsigned ExtraCSSpill = 0;
2992 // Figure out which callee-saved registers to save/restore.
2993 for (unsigned i = 0; CSRegs[i]; ++i) {
2994 const unsigned Reg = CSRegs[i];
2995
2996 // Add the base pointer register to SavedRegs if it is callee-save.
2997 if (Reg == BasePointerReg)
2998 SavedRegs.set(Reg);
2999
3000 bool RegUsed = SavedRegs.test(Reg);
3001 unsigned PairedReg = AArch64::NoRegister;
3002 if (AArch64::GPR64RegClass.contains(Reg) ||
3003 AArch64::FPR64RegClass.contains(Reg) ||
3004 AArch64::FPR128RegClass.contains(Reg))
3005 PairedReg = CSRegs[i ^ 1];
3006
3007 if (!RegUsed) {
3008 if (AArch64::GPR64RegClass.contains(Reg) &&
3009 !RegInfo->isReservedReg(MF, Reg)) {
3010 UnspilledCSGPR = Reg;
3011 UnspilledCSGPRPaired = PairedReg;
3012 }
3013 continue;
3014 }
3015
3016 // MachO's compact unwind format relies on all registers being stored in
3017 // pairs.
3018 // FIXME: the usual format is actually better if unwinding isn't needed.
3019 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3020 !SavedRegs.test(PairedReg)) {
3021 SavedRegs.set(PairedReg);
3022 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3023 !RegInfo->isReservedReg(MF, PairedReg))
3024 ExtraCSSpill = PairedReg;
3025 }
3026 }
3027
3029 !Subtarget.isTargetWindows()) {
3030 // For Windows calling convention on a non-windows OS, where X18 is treated
3031 // as reserved, back up X18 when entering non-windows code (marked with the
3032 // Windows calling convention) and restore when returning regardless of
3033 // whether the individual function uses it - it might call other functions
3034 // that clobber it.
3035 SavedRegs.set(AArch64::X18);
3036 }
3037
3038 // Calculates the callee saved stack size.
3039 unsigned CSStackSize = 0;
3040 unsigned SVECSStackSize = 0;
3042 const MachineRegisterInfo &MRI = MF.getRegInfo();
3043 for (unsigned Reg : SavedRegs.set_bits()) {
3044 auto RegSize = TRI->getRegSizeInBits(Reg, MRI) / 8;
3045 if (AArch64::PPRRegClass.contains(Reg) ||
3046 AArch64::ZPRRegClass.contains(Reg))
3047 SVECSStackSize += RegSize;
3048 else
3049 CSStackSize += RegSize;
3050 }
3051
3052 // Save number of saved regs, so we can easily update CSStackSize later.
3053 unsigned NumSavedRegs = SavedRegs.count();
3054
3055 // The frame record needs to be created by saving the appropriate registers
3056 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3057 if (hasFP(MF) ||
3058 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3059 SavedRegs.set(AArch64::FP);
3060 SavedRegs.set(AArch64::LR);
3061 }
3062
3063 LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3064 for (unsigned Reg
3065 : SavedRegs.set_bits()) dbgs()
3066 << ' ' << printReg(Reg, RegInfo);
3067 dbgs() << "\n";);
3068
3069 // If any callee-saved registers are used, the frame cannot be eliminated.
3070 int64_t SVEStackSize =
3071 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3072 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3073
3074 // The CSR spill slots have not been allocated yet, so estimateStackSize
3075 // won't include them.
3076 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3077
3078 // We may address some of the stack above the canonical frame address, either
3079 // for our own arguments or during a call. Include that in calculating whether
3080 // we have complicated addressing concerns.
3081 int64_t CalleeStackUsed = 0;
3082 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3083 int64_t FixedOff = MFI.getObjectOffset(I);
3084 if (FixedOff > CalleeStackUsed) CalleeStackUsed = FixedOff;
3085 }
3086
3087 // Conservatively always assume BigStack when there are SVE spills.
3088 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3089 CalleeStackUsed) > EstimatedStackSizeLimit;
3090 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3091 AFI->setHasStackFrame(true);
3092
3093 // Estimate if we might need to scavenge a register at some point in order
3094 // to materialize a stack offset. If so, either spill one additional
3095 // callee-saved register or reserve a special spill slot to facilitate
3096 // register scavenging. If we already spilled an extra callee-saved register
3097 // above to keep the number of spills even, we don't need to do anything else
3098 // here.
3099 if (BigStack) {
3100 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3101 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3102 << " to get a scratch register.\n");
3103 SavedRegs.set(UnspilledCSGPR);
3104 // MachO's compact unwind format relies on all registers being stored in
3105 // pairs, so if we need to spill one extra for BigStack, then we need to
3106 // store the pair.
3107 if (producePairRegisters(MF))
3108 SavedRegs.set(UnspilledCSGPRPaired);
3109 ExtraCSSpill = UnspilledCSGPR;
3110 }
3111
3112 // If we didn't find an extra callee-saved register to spill, create
3113 // an emergency spill slot.
3114 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3116 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3117 unsigned Size = TRI->getSpillSize(RC);
3118 Align Alignment = TRI->getSpillAlign(RC);
3119 int FI = MFI.CreateStackObject(Size, Alignment, false);
3121 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3122 << " as the emergency spill slot.\n");
3123 }
3124 }
3125
3126 // Adding the size of additional 64bit GPR saves.
3127 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3128
3129 // A Swift asynchronous context extends the frame record with a pointer
3130 // directly before FP.
3131 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3132 CSStackSize += 8;
3133
3134 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3135 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3136 << EstimatedStackSize + AlignedCSStackSize
3137 << " bytes.\n");
3138
3140 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3141 "Should not invalidate callee saved info");
3142
3143 // Round up to register pair alignment to avoid additional SP adjustment
3144 // instructions.
3145 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3146 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3147 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3148}
3149
3151 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3152 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3153 unsigned &MaxCSFrameIndex) const {
3154 bool NeedsWinCFI = needsWinCFI(MF);
3155 // To match the canonical windows frame layout, reverse the list of
3156 // callee saved registers to get them laid out by PrologEpilogInserter
3157 // in the right order. (PrologEpilogInserter allocates stack objects top
3158 // down. Windows canonical prologs store higher numbered registers at
3159 // the top, thus have the CSI array start from the highest registers.)
3160 if (NeedsWinCFI)
3161 std::reverse(CSI.begin(), CSI.end());
3162
3163 if (CSI.empty())
3164 return true; // Early exit if no callee saved registers are modified!
3165
3166 // Now that we know which registers need to be saved and restored, allocate
3167 // stack slots for them.
3168 MachineFrameInfo &MFI = MF.getFrameInfo();
3169 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3170
3171 bool UsesWinAAPCS = isTargetWindows(MF);
3172 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3173 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3174 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3175 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3176 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3177 }
3178
3179 for (auto &CS : CSI) {
3180 Register Reg = CS.getReg();
3181 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3182
3183 unsigned Size = RegInfo->getSpillSize(*RC);
3184 Align Alignment(RegInfo->getSpillAlign(*RC));
3185 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3186 CS.setFrameIdx(FrameIdx);
3187
3188 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3189 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3190
3191 // Grab 8 bytes below FP for the extended asynchronous frame info.
3192 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3193 Reg == AArch64::FP) {
3194 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3195 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3196 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3197 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3198 }
3199 }
3200 return true;
3201}
3202
3204 const MachineFunction &MF) const {
3206 return AFI->hasCalleeSaveStackFreeSpace();
3207}
3208
3209/// returns true if there are any SVE callee saves.
3211 int &Min, int &Max) {
3212 Min = std::numeric_limits<int>::max();
3213 Max = std::numeric_limits<int>::min();
3214
3215 if (!MFI.isCalleeSavedInfoValid())
3216 return false;
3217
3218 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
3219 for (auto &CS : CSI) {
3220 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
3221 AArch64::PPRRegClass.contains(CS.getReg())) {
3222 assert((Max == std::numeric_limits<int>::min() ||
3223 Max + 1 == CS.getFrameIdx()) &&
3224 "SVE CalleeSaves are not consecutive");
3225
3226 Min = std::min(Min, CS.getFrameIdx());
3227 Max = std::max(Max, CS.getFrameIdx());
3228 }
3229 }
3230 return Min != std::numeric_limits<int>::max();
3231}
3232
3233// Process all the SVE stack objects and determine offsets for each
3234// object. If AssignOffsets is true, the offsets get assigned.
3235// Fills in the first and last callee-saved frame indices into
3236// Min/MaxCSFrameIndex, respectively.
3237// Returns the size of the stack.
3239 int &MinCSFrameIndex,
3240 int &MaxCSFrameIndex,
3241 bool AssignOffsets) {
3242#ifndef NDEBUG
3243 // First process all fixed stack objects.
3244 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
3246 "SVE vectors should never be passed on the stack by value, only by "
3247 "reference.");
3248#endif
3249
3250 auto Assign = [&MFI](int FI, int64_t Offset) {
3251 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
3252 MFI.setObjectOffset(FI, Offset);
3253 };
3254
3255 int64_t Offset = 0;
3256
3257 // Then process all callee saved slots.
3258 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
3259 // Assign offsets to the callee save slots.
3260 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
3261 Offset += MFI.getObjectSize(I);
3263 if (AssignOffsets)
3264 Assign(I, -Offset);
3265 }
3266 }
3267
3268 // Ensure that the Callee-save area is aligned to 16bytes.
3269 Offset = alignTo(Offset, Align(16U));
3270
3271 // Create a buffer of SVE objects to allocate and sort it.
3272 SmallVector<int, 8> ObjectsToAllocate;
3273 // If we have a stack protector, and we've previously decided that we have SVE
3274 // objects on the stack and thus need it to go in the SVE stack area, then it
3275 // needs to go first.
3276 int StackProtectorFI = -1;
3277 if (MFI.hasStackProtectorIndex()) {
3278 StackProtectorFI = MFI.getStackProtectorIndex();
3279 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
3280 ObjectsToAllocate.push_back(StackProtectorFI);
3281 }
3282 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
3283 unsigned StackID = MFI.getStackID(I);
3284 if (StackID != TargetStackID::ScalableVector)
3285 continue;
3286 if (I == StackProtectorFI)
3287 continue;
3288 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
3289 continue;
3290 if (MFI.isDeadObjectIndex(I))
3291 continue;
3292
3293 ObjectsToAllocate.push_back(I);
3294 }
3295
3296 // Allocate all SVE locals and spills
3297 for (unsigned FI : ObjectsToAllocate) {
3298 Align Alignment = MFI.getObjectAlign(FI);
3299 // FIXME: Given that the length of SVE vectors is not necessarily a power of
3300 // two, we'd need to align every object dynamically at runtime if the
3301 // alignment is larger than 16. This is not yet supported.
3302 if (Alignment > Align(16))
3304 "Alignment of scalable vectors > 16 bytes is not yet supported");
3305
3306 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
3307 if (AssignOffsets)
3308 Assign(FI, -Offset);
3309 }
3310
3311 return Offset;
3312}
3313
3314int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
3315 MachineFrameInfo &MFI) const {
3316 int MinCSFrameIndex, MaxCSFrameIndex;
3317 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
3318}
3319
3320int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
3321 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
3322 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
3323 true);
3324}
3325
3327 MachineFunction &MF, RegScavenger *RS) const {
3328 MachineFrameInfo &MFI = MF.getFrameInfo();
3329
3331 "Upwards growing stack unsupported");
3332
3333 int MinCSFrameIndex, MaxCSFrameIndex;
3334 int64_t SVEStackSize =
3335 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
3336
3338 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
3339 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
3340
3341 // If this function isn't doing Win64-style C++ EH, we don't need to do
3342 // anything.
3343 if (!MF.hasEHFunclets())
3344 return;
3346 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3347
3348 MachineBasicBlock &MBB = MF.front();
3349 auto MBBI = MBB.begin();
3350 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3351 ++MBBI;
3352
3353 // Create an UnwindHelp object.
3354 // The UnwindHelp object is allocated at the start of the fixed object area
3355 int64_t FixedObject =
3356 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
3357 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
3358 /*SPOffset*/ -FixedObject,
3359 /*IsImmutable=*/false);
3360 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3361
3362 // We need to store -2 into the UnwindHelp object at the start of the
3363 // function.
3364 DebugLoc DL;
3366 RS->backward(std::prev(MBBI));
3367 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3368 assert(DstReg && "There must be a free register after frame setup");
3369 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3370 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3371 .addReg(DstReg, getKillRegState(true))
3372 .addFrameIndex(UnwindHelpFI)
3373 .addImm(0);
3374}
3375
3376namespace {
3377struct TagStoreInstr {
3379 int64_t Offset, Size;
3380 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3381 : MI(MI), Offset(Offset), Size(Size) {}
3382};
3383
3384class TagStoreEdit {
3385 MachineFunction *MF;
3388 // Tag store instructions that are being replaced.
3390 // Combined memref arguments of the above instructions.
3392
3393 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3394 // FrameRegOffset + Size) with the address tag of SP.
3395 Register FrameReg;
3396 StackOffset FrameRegOffset;
3397 int64_t Size;
3398 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3399 // end.
3400 std::optional<int64_t> FrameRegUpdate;
3401 // MIFlags for any FrameReg updating instructions.
3402 unsigned FrameRegUpdateFlags;
3403
3404 // Use zeroing instruction variants.
3405 bool ZeroData;
3406 DebugLoc DL;
3407
3408 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3409 void emitLoop(MachineBasicBlock::iterator InsertI);
3410
3411public:
3412 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3413 : MBB(MBB), ZeroData(ZeroData) {
3414 MF = MBB->getParent();
3415 MRI = &MF->getRegInfo();
3416 }
3417 // Add an instruction to be replaced. Instructions must be added in the
3418 // ascending order of Offset, and have to be adjacent.
3419 void addInstruction(TagStoreInstr I) {
3420 assert((TagStores.empty() ||
3421 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3422 "Non-adjacent tag store instructions.");
3423 TagStores.push_back(I);
3424 }
3425 void clear() { TagStores.clear(); }
3426 // Emit equivalent code at the given location, and erase the current set of
3427 // instructions. May skip if the replacement is not profitable. May invalidate
3428 // the input iterator and replace it with a valid one.
3429 void emitCode(MachineBasicBlock::iterator &InsertI,
3430 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3431};
3432
3433void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3434 const AArch64InstrInfo *TII =
3435 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3436
3437 const int64_t kMinOffset = -256 * 16;
3438 const int64_t kMaxOffset = 255 * 16;
3439
3440 Register BaseReg = FrameReg;
3441 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3442 if (BaseRegOffsetBytes < kMinOffset ||
3443 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3444 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3445 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3446 // is required for the offset of ST2G.
3447 BaseRegOffsetBytes % 16 != 0) {
3448 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3449 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3450 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3451 BaseReg = ScratchReg;
3452 BaseRegOffsetBytes = 0;
3453 }
3454
3455 MachineInstr *LastI = nullptr;
3456 while (Size) {
3457 int64_t InstrSize = (Size > 16) ? 32 : 16;
3458 unsigned Opcode =
3459 InstrSize == 16
3460 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3461 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3462 assert(BaseRegOffsetBytes % 16 == 0);
3463 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3464 .addReg(AArch64::SP)
3465 .addReg(BaseReg)
3466 .addImm(BaseRegOffsetBytes / 16)
3467 .setMemRefs(CombinedMemRefs);
3468 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3469 // final SP adjustment in the epilogue.
3470 if (BaseRegOffsetBytes == 0)
3471 LastI = I;
3472 BaseRegOffsetBytes += InstrSize;
3473 Size -= InstrSize;
3474 }
3475
3476 if (LastI)
3477 MBB->splice(InsertI, MBB, LastI);
3478}
3479
3480void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3481 const AArch64InstrInfo *TII =
3482 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3483
3484 Register BaseReg = FrameRegUpdate
3485 ? FrameReg
3486 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3487 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3488
3489 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3490
3491 int64_t LoopSize = Size;
3492 // If the loop size is not a multiple of 32, split off one 16-byte store at
3493 // the end to fold BaseReg update into.
3494 if (FrameRegUpdate && *FrameRegUpdate)
3495 LoopSize -= LoopSize % 32;
3496 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3497 TII->get(ZeroData ? AArch64::STZGloop_wback
3498 : AArch64::STGloop_wback))
3499 .addDef(SizeReg)
3500 .addDef(BaseReg)
3501 .addImm(LoopSize)
3502 .addReg(BaseReg)
3503 .setMemRefs(CombinedMemRefs);
3504 if (FrameRegUpdate)
3505 LoopI->setFlags(FrameRegUpdateFlags);
3506
3507 int64_t ExtraBaseRegUpdate =
3508 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3509 if (LoopSize < Size) {
3510 assert(FrameRegUpdate);
3511 assert(Size - LoopSize == 16);
3512 // Tag 16 more bytes at BaseReg and update BaseReg.
3513 BuildMI(*MBB, InsertI, DL,
3514 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3515 .addDef(BaseReg)
3516 .addReg(BaseReg)
3517 .addReg(BaseReg)
3518 .addImm(1 + ExtraBaseRegUpdate / 16)
3519 .setMemRefs(CombinedMemRefs)
3520 .setMIFlags(FrameRegUpdateFlags);
3521 } else if (ExtraBaseRegUpdate) {
3522 // Update BaseReg.
3523 BuildMI(
3524 *MBB, InsertI, DL,
3525 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3526 .addDef(BaseReg)
3527 .addReg(BaseReg)
3528 .addImm(std::abs(ExtraBaseRegUpdate))
3529 .addImm(0)
3530 .setMIFlags(FrameRegUpdateFlags);
3531 }
3532}
3533
3534// Check if *II is a register update that can be merged into STGloop that ends
3535// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3536// end of the loop.
3537bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3538 int64_t Size, int64_t *TotalOffset) {
3539 MachineInstr &MI = *II;
3540 if ((MI.getOpcode() == AArch64::ADDXri ||
3541 MI.getOpcode() == AArch64::SUBXri) &&
3542 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3543 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3544 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3545 if (MI.getOpcode() == AArch64::SUBXri)
3546 Offset = -Offset;
3547 int64_t AbsPostOffset = std::abs(Offset - Size);
3548 const int64_t kMaxOffset =
3549 0xFFF; // Max encoding for unshifted ADDXri / SUBXri
3550 if (AbsPostOffset <= kMaxOffset && AbsPostOffset % 16 == 0) {
3551 *TotalOffset = Offset;
3552 return true;
3553 }
3554 }
3555 return false;
3556}
3557
3558void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3560 MemRefs.clear();
3561 for (auto &TS : TSE) {
3562 MachineInstr *MI = TS.MI;
3563 // An instruction without memory operands may access anything. Be
3564 // conservative and return an empty list.
3565 if (MI->memoperands_empty()) {
3566 MemRefs.clear();
3567 return;
3568 }
3569 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3570 }
3571}
3572
3573void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3574 const AArch64FrameLowering *TFI,
3575 bool TryMergeSPUpdate) {
3576 if (TagStores.empty())
3577 return;
3578 TagStoreInstr &FirstTagStore = TagStores[0];
3579 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3580 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3581 DL = TagStores[0].MI->getDebugLoc();
3582
3583 Register Reg;
3584 FrameRegOffset = TFI->resolveFrameOffsetReference(
3585 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
3586 /*PreferFP=*/false, /*ForSimm=*/true);
3587 FrameReg = Reg;
3588 FrameRegUpdate = std::nullopt;
3589
3590 mergeMemRefs(TagStores, CombinedMemRefs);
3591
3592 LLVM_DEBUG(dbgs() << "Replacing adjacent STG instructions:\n";
3593 for (const auto &Instr
3594 : TagStores) { dbgs() << " " << *Instr.MI; });
3595
3596 // Size threshold where a loop becomes shorter than a linear sequence of
3597 // tagging instructions.
3598 const int kSetTagLoopThreshold = 176;
3599 if (Size < kSetTagLoopThreshold) {
3600 if (TagStores.size() < 2)
3601 return;
3602 emitUnrolled(InsertI);
3603 } else {
3604 MachineInstr *UpdateInstr = nullptr;
3605 int64_t TotalOffset = 0;
3606 if (TryMergeSPUpdate) {
3607 // See if we can merge base register update into the STGloop.
3608 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3609 // but STGloop is way too unusual for that, and also it only
3610 // realistically happens in function epilogue. Also, STGloop is expanded
3611 // before that pass.
3612 if (InsertI != MBB->end() &&
3613 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3614 &TotalOffset)) {
3615 UpdateInstr = &*InsertI++;
3616 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3617 << *UpdateInstr);
3618 }
3619 }
3620
3621 if (!UpdateInstr && TagStores.size() < 2)
3622 return;
3623
3624 if (UpdateInstr) {
3625 FrameRegUpdate = TotalOffset;
3626 FrameRegUpdateFlags = UpdateInstr->getFlags();
3627 }
3628 emitLoop(InsertI);
3629 if (UpdateInstr)
3630 UpdateInstr->eraseFromParent();
3631 }
3632
3633 for (auto &TS : TagStores)
3634 TS.MI->eraseFromParent();
3635}
3636
3637bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3638 int64_t &Size, bool &ZeroData) {
3639 MachineFunction &MF = *MI.getParent()->getParent();
3640 const MachineFrameInfo &MFI = MF.getFrameInfo();
3641
3642 unsigned Opcode = MI.getOpcode();
3643 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3644 Opcode == AArch64::STZ2Gi);
3645
3646 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3647 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3648 return false;
3649 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3650 return false;
3651 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3652 Size = MI.getOperand(2).getImm();
3653 return true;
3654 }
3655
3656 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3657 Size = 16;
3658 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3659 Size = 32;
3660 else
3661 return false;
3662
3663 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3664 return false;
3665
3666 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3667 16 * MI.getOperand(2).getImm();
3668 return true;
3669}
3670
3671// Detect a run of memory tagging instructions for adjacent stack frame slots,
3672// and replace them with a shorter instruction sequence:
3673// * replace STG + STG with ST2G
3674// * replace STGloop + STGloop with STGloop
3675// This code needs to run when stack slot offsets are already known, but before
3676// FrameIndex operands in STG instructions are eliminated.
3678 const AArch64FrameLowering *TFI,
3679 RegScavenger *RS) {
3680 bool FirstZeroData;
3681 int64_t Size, Offset;
3682 MachineInstr &MI = *II;
3683 MachineBasicBlock *MBB = MI.getParent();
3684 MachineBasicBlock::iterator NextI = ++II;
3685 if (&MI == &MBB->instr_back())
3686 return II;
3687 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3688 return II;
3689
3691 Instrs.emplace_back(&MI, Offset, Size);
3692
3693 constexpr int kScanLimit = 10;
3694 int Count = 0;
3696 NextI != E && Count < kScanLimit; ++NextI) {
3697 MachineInstr &MI = *NextI;
3698 bool ZeroData;
3699 int64_t Size, Offset;
3700 // Collect instructions that update memory tags with a FrameIndex operand
3701 // and (when applicable) constant size, and whose output registers are dead
3702 // (the latter is almost always the case in practice). Since these
3703 // instructions effectively have no inputs or outputs, we are free to skip
3704 // any non-aliasing instructions in between without tracking used registers.
3705 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3706 if (ZeroData != FirstZeroData)
3707 break;
3708 Instrs.emplace_back(&MI, Offset, Size);
3709 continue;
3710 }
3711
3712 // Only count non-transient, non-tagging instructions toward the scan
3713 // limit.
3714 if (!MI.isTransient())
3715 ++Count;
3716
3717 // Just in case, stop before the epilogue code starts.
3718 if (MI.getFlag(MachineInstr::FrameSetup) ||
3720 break;
3721
3722 // Reject anything that may alias the collected instructions.
3723 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
3724 break;
3725 }
3726
3727 // New code will be inserted after the last tagging instruction we've found.
3728 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3729 InsertI++;
3730
3731 llvm::stable_sort(Instrs,
3732 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3733 return Left.Offset < Right.Offset;
3734 });
3735
3736 // Make sure that we don't have any overlapping stores.
3737 int64_t CurOffset = Instrs[0].Offset;
3738 for (auto &Instr : Instrs) {
3739 if (CurOffset > Instr.Offset)
3740 return NextI;
3741 CurOffset = Instr.Offset + Instr.Size;
3742 }
3743
3744 // Find contiguous runs of tagged memory and emit shorter instruction
3745 // sequencies for them when possible.
3746 TagStoreEdit TSE(MBB, FirstZeroData);
3747 std::optional<int64_t> EndOffset;
3748 for (auto &Instr : Instrs) {
3749 if (EndOffset && *EndOffset != Instr.Offset) {
3750 // Found a gap.
3751 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3752 TSE.clear();
3753 }
3754
3755 TSE.addInstruction(Instr);
3756 EndOffset = Instr.Offset + Instr.Size;
3757 }
3758
3759 const MachineFunction *MF = MBB->getParent();
3760 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3761 TSE.emitCode(
3762 InsertI, TFI, /*TryMergeSPUpdate = */
3764
3765 return InsertI;
3766}
3767} // namespace
3768
3770 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3772 for (auto &BB : MF)
3773 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();)
3774 II = tryMergeAdjacentSTG(II, this, RS);
3775}
3776
3777/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3778/// before the update. This is easily retrieved as it is exactly the offset
3779/// that is set in processFunctionBeforeFrameFinalized.
3781 const MachineFunction &MF, int FI, Register &FrameReg,
3782 bool IgnoreSPUpdates) const {
3783 const MachineFrameInfo &MFI = MF.getFrameInfo();
3784 if (IgnoreSPUpdates) {
3785 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3786 << MFI.getObjectOffset(FI) << "\n");
3787 FrameReg = AArch64::SP;
3788 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3789 }
3790
3791 // Go to common code if we cannot provide sp + offset.
3792 if (MFI.hasVarSizedObjects() ||
3795 return getFrameIndexReference(MF, FI, FrameReg);
3796
3797 FrameReg = AArch64::SP;
3798 return getStackOffset(MF, MFI.getObjectOffset(FI));
3799}
3800
3801/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3802/// the parent's frame pointer
3804 const MachineFunction &MF) const {
3805 return 0;
3806}
3807
3808/// Funclets only need to account for space for the callee saved registers,
3809/// as the locals are accounted for in the parent's stack frame.
3811 const MachineFunction &MF) const {
3812 // This is the size of the pushed CSRs.
3813 unsigned CSSize =
3814 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3815 // This is the amount of stack a funclet needs to allocate.
3816 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3817 getStackAlign());
3818}
3819
3820namespace {
3821struct FrameObject {
3822 bool IsValid = false;
3823 // Index of the object in MFI.
3824 int ObjectIndex = 0;
3825 // Group ID this object belongs to.
3826 int GroupIndex = -1;
3827 // This object should be placed first (closest to SP).
3828 bool ObjectFirst = false;
3829 // This object's group (which always contains the object with
3830 // ObjectFirst==true) should be placed first.
3831 bool GroupFirst = false;
3832};
3833
3834class GroupBuilder {
3835 SmallVector<int, 8> CurrentMembers;
3836 int NextGroupIndex = 0;
3837 std::vector<FrameObject> &Objects;
3838
3839public:
3840 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3841 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3842 void EndCurrentGroup() {
3843 if (CurrentMembers.size() > 1) {
3844 // Create a new group with the current member list. This might remove them
3845 // from their pre-existing groups. That's OK, dealing with overlapping
3846 // groups is too hard and unlikely to make a difference.
3847 LLVM_DEBUG(dbgs() << "group:");
3848 for (int Index : CurrentMembers) {
3849 Objects[Index].GroupIndex = NextGroupIndex;
3850 LLVM_DEBUG(dbgs() << " " << Index);
3851 }
3852 LLVM_DEBUG(dbgs() << "\n");
3853 NextGroupIndex++;
3854 }
3855 CurrentMembers.clear();
3856 }
3857};
3858
3859bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3860 // Objects at a lower index are closer to FP; objects at a higher index are
3861 // closer to SP.
3862 //
3863 // For consistency in our comparison, all invalid objects are placed
3864 // at the end. This also allows us to stop walking when we hit the
3865 // first invalid item after it's all sorted.
3866 //
3867 // The "first" object goes first (closest to SP), followed by the members of
3868 // the "first" group.
3869 //
3870 // The rest are sorted by the group index to keep the groups together.
3871 // Higher numbered groups are more likely to be around longer (i.e. untagged
3872 // in the function epilogue and not at some earlier point). Place them closer
3873 // to SP.
3874 //
3875 // If all else equal, sort by the object index to keep the objects in the
3876 // original order.
3877 return std::make_tuple(!A.IsValid, A.ObjectFirst, A.GroupFirst, A.GroupIndex,
3878 A.ObjectIndex) <
3879 std::make_tuple(!B.IsValid, B.ObjectFirst, B.GroupFirst, B.GroupIndex,
3880 B.ObjectIndex);
3881}
3882} // namespace
3883
3885 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3886 if (!OrderFrameObjects || ObjectsToAllocate.empty())
3887 return;
3888
3889 const MachineFrameInfo &MFI = MF.getFrameInfo();
3890 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3891 for (auto &Obj : ObjectsToAllocate) {
3892 FrameObjects[Obj].IsValid = true;
3893 FrameObjects[Obj].ObjectIndex = Obj;
3894 }
3895
3896 // Identify stack slots that are tagged at the same time.
3897 GroupBuilder GB(FrameObjects);
3898 for (auto &MBB : MF) {
3899 for (auto &MI : MBB) {
3900 if (MI.isDebugInstr())
3901 continue;
3902 int OpIndex;
3903 switch (MI.getOpcode()) {
3904 case AArch64::STGloop:
3905 case AArch64::STZGloop:
3906 OpIndex = 3;
3907 break;
3908 case AArch64::STGi:
3909 case AArch64::STZGi:
3910 case AArch64::ST2Gi:
3911 case AArch64::STZ2Gi:
3912 OpIndex = 1;
3913 break;
3914 default:
3915 OpIndex = -1;
3916 }
3917
3918 int TaggedFI = -1;
3919 if (OpIndex >= 0) {
3920 const MachineOperand &MO = MI.getOperand(OpIndex);
3921 if (MO.isFI()) {
3922 int FI = MO.getIndex();
3923 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3924 FrameObjects[FI].IsValid)
3925 TaggedFI = FI;
3926 }
3927 }
3928
3929 // If this is a stack tagging instruction for a slot that is not part of a
3930 // group yet, either start a new group or add it to the current one.
3931 if (TaggedFI >= 0)
3932 GB.AddMember(TaggedFI);
3933 else
3934 GB.EndCurrentGroup();
3935 }
3936 // Groups should never span multiple basic blocks.
3937 GB.EndCurrentGroup();
3938 }
3939
3940 // If the function's tagged base pointer is pinned to a stack slot, we want to
3941 // put that slot first when possible. This will likely place it at SP + 0,
3942 // and save one instruction when generating the base pointer because IRG does
3943 // not allow an immediate offset.
3945 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3946 if (TBPI) {
3947 FrameObjects[*TBPI].ObjectFirst = true;
3948 FrameObjects[*TBPI].GroupFirst = true;
3949 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3950 if (FirstGroupIndex >= 0)
3951 for (FrameObject &Object : FrameObjects)
3952 if (Object.GroupIndex == FirstGroupIndex)
3953 Object.GroupFirst = true;
3954 }
3955
3956 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3957
3958 int i = 0;
3959 for (auto &Obj : FrameObjects) {
3960 // All invalid items are sorted at the end, so it's safe to stop.
3961 if (!Obj.IsValid)
3962 break;
3963 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3964 }
3965
3966 LLVM_DEBUG(dbgs() << "Final frame order:\n"; for (auto &Obj
3967 : FrameObjects) {
3968 if (!Obj.IsValid)
3969 break;
3970 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3971 if (Obj.ObjectFirst)
3972 dbgs() << ", first";
3973 if (Obj.GroupFirst)
3974 dbgs() << ", group-first";
3975 dbgs() << "\n";
3976 });
3977}
unsigned const MachineRegisterInfo * MRI
#define Success
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static void InsertReturnAddressAuth(MachineFunction &MF, MachineBasicBlock &MBB, bool NeedsWinCFI, bool *HasWinCFI)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool needsWinCFI(const MachineFunction &MF)
static cl::opt< bool > ReverseCSRRestoreSeq("reverse-csr-restore-seq", cl::desc("reverse the CSR restore sequence"), cl::init(false), cl::Hidden)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static unsigned findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
static bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF)
unsigned RegSize
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
static const int kSetTagLoopThreshold
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
static void clear(coro::Shape &Shape)
Definition: Coroutines.cpp:149
#define CASE(S, T)
#define LLVM_DEBUG(X)
Definition: Debug.h:101
uint64_t Size
bool End
Definition: ELF_riscv.cpp:464
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
if(VerifyEach)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:167
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:470
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFP(const MachineFunction &MF) const override
hasFP - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isCallingConvWin64(CallingConv::ID CC) const
const char * getChkStkName() const
bool isXRegisterReserved(size_t i) const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:163
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:158
bool test(unsigned Idx) const
Definition: BitVector.h:461
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:645
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:642
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:237
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:313
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:644
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc) const override
Emit instructions to copy a pair of physical registers.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:50
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:81
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:799
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int Offset)
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:547
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register)
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:604
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register)
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:616
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int Offset)
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:533
static MCCFIInstruction createNegateRAState(MCSymbol *L)
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:597
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:632
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int Offset)
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:571
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:318
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:24
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:41
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
uint8_t getStackID(int ObjectIdx) const
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, uint64_t s, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
const LLVMTargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
MachineModuleInfo & getMMI() const
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & copyImplicitOps(const MachineInstr &OtherMI) const
Copy all the implicit operands from OtherMI onto this one.
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:68
void setFlags(unsigned flags)
Definition: MachineInstr.h:373
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:359
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
This class contains meta information specific to a module.
const MCContext & getContext() const
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:305
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:577
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:941
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:687
void push_back(const T &Elt)
Definition: SmallVector.h:416
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1200
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:36
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:52
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:55
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:47
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:46
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:45
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:50
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetRegisterInfo * getRegisterInfo() const
getRegisterInfo - If register information is available, return it.
virtual const TargetInstrInfo * getInstrInfo() const