LLVM 19.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | |
56// | callee-saved fp/simd/SVE regs |
57// | |
58// |-----------------------------------|
59// | |
60// | SVE stack objects |
61// | |
62// |-----------------------------------|
63// |.empty.space.to.make.part.below....|
64// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
65// |.the.standard.16-byte.alignment....| compile time; if present)
66// |-----------------------------------|
67// | |
68// | local variables of fixed size |
69// | including spill slots |
70// |-----------------------------------| <- bp(not defined by ABI,
71// |.variable-sized.local.variables....| LLVM chooses X19)
72// |.(VLAs)............................| (size of this area is unknown at
73// |...................................| compile time)
74// |-----------------------------------| <- sp
75// | | Lower address
76//
77//
78// To access the data in a frame, at-compile time, a constant offset must be
79// computable from one of the pointers (fp, bp, sp) to access it. The size
80// of the areas with a dotted background cannot be computed at compile-time
81// if they are present, making it required to have all three of fp, bp and
82// sp to be set up to be able to access all contents in the frame areas,
83// assuming all of the frame areas are non-empty.
84//
85// For most functions, some of the frame areas are empty. For those functions,
86// it may not be necessary to set up fp or bp:
87// * A base pointer is definitely needed when there are both VLAs and local
88// variables with more-than-default alignment requirements.
89// * A frame pointer is definitely needed when there are local variables with
90// more-than-default alignment requirements.
91//
92// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
93// callee-saved area, since the unwind encoding does not allow for encoding
94// this dynamically and existing tools depend on this layout. For other
95// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
96// area to allow SVE stack objects (allocated directly below the callee-saves,
97// if available) to be accessed directly from the framepointer.
98// The SVE spill/fill instructions have VL-scaled addressing modes such
99// as:
100// ldr z8, [fp, #-7 mul vl]
101// For SVE the size of the vector length (VL) is not known at compile-time, so
102// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
103// layout, we don't need to add an unscaled offset to the framepointer before
104// accessing the SVE object in the frame.
105//
106// In some cases when a base pointer is not strictly needed, it is generated
107// anyway when offsets from the frame pointer to access local variables become
108// so large that the offset can't be encoded in the immediate fields of loads
109// or stores.
110//
111// Outgoing function arguments must be at the bottom of the stack frame when
112// calling another function. If we do not have variable-sized stack objects, we
113// can allocate a "reserved call frame" area at the bottom of the local
114// variable area, large enough for all outgoing calls. If we do have VLAs, then
115// the stack pointer must be decremented and incremented around each call to
116// make space for the arguments below the VLAs.
117//
118// FIXME: also explain the redzone concept.
119//
120// An example of the prologue:
121//
122// .globl __foo
123// .align 2
124// __foo:
125// Ltmp0:
126// .cfi_startproc
127// .cfi_personality 155, ___gxx_personality_v0
128// Leh_func_begin:
129// .cfi_lsda 16, Lexception33
130//
131// stp xa,bx, [sp, -#offset]!
132// ...
133// stp x28, x27, [sp, #offset-32]
134// stp fp, lr, [sp, #offset-16]
135// add fp, sp, #offset - 16
136// sub sp, sp, #1360
137//
138// The Stack:
139// +-------------------------------------------+
140// 10000 | ........ | ........ | ........ | ........ |
141// 10004 | ........ | ........ | ........ | ........ |
142// +-------------------------------------------+
143// 10008 | ........ | ........ | ........ | ........ |
144// 1000c | ........ | ........ | ........ | ........ |
145// +===========================================+
146// 10010 | X28 Register |
147// 10014 | X28 Register |
148// +-------------------------------------------+
149// 10018 | X27 Register |
150// 1001c | X27 Register |
151// +===========================================+
152// 10020 | Frame Pointer |
153// 10024 | Frame Pointer |
154// +-------------------------------------------+
155// 10028 | Link Register |
156// 1002c | Link Register |
157// +===========================================+
158// 10030 | ........ | ........ | ........ | ........ |
159// 10034 | ........ | ........ | ........ | ........ |
160// +-------------------------------------------+
161// 10038 | ........ | ........ | ........ | ........ |
162// 1003c | ........ | ........ | ........ | ........ |
163// +-------------------------------------------+
164//
165// [sp] = 10030 :: >>initial value<<
166// sp = 10020 :: stp fp, lr, [sp, #-16]!
167// fp = sp == 10020 :: mov fp, sp
168// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
169// sp == 10010 :: >>final value<<
170//
171// The frame pointer (w29) points to address 10020. If we use an offset of
172// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
173// for w27, and -32 for w28:
174//
175// Ltmp1:
176// .cfi_def_cfa w29, 16
177// Ltmp2:
178// .cfi_offset w30, -8
179// Ltmp3:
180// .cfi_offset w29, -16
181// Ltmp4:
182// .cfi_offset w27, -24
183// Ltmp5:
184// .cfi_offset w28, -32
185//
186//===----------------------------------------------------------------------===//
187
188#include "AArch64FrameLowering.h"
189#include "AArch64InstrInfo.h"
191#include "AArch64RegisterInfo.h"
192#include "AArch64Subtarget.h"
193#include "AArch64TargetMachine.h"
196#include "llvm/ADT/ScopeExit.h"
197#include "llvm/ADT/SmallVector.h"
198#include "llvm/ADT/Statistic.h"
214#include "llvm/IR/Attributes.h"
215#include "llvm/IR/CallingConv.h"
216#include "llvm/IR/DataLayout.h"
217#include "llvm/IR/DebugLoc.h"
218#include "llvm/IR/Function.h"
219#include "llvm/MC/MCAsmInfo.h"
220#include "llvm/MC/MCDwarf.h"
222#include "llvm/Support/Debug.h"
228#include <cassert>
229#include <cstdint>
230#include <iterator>
231#include <optional>
232#include <vector>
233
234using namespace llvm;
235
236#define DEBUG_TYPE "frame-info"
237
238static cl::opt<bool> EnableRedZone("aarch64-redzone",
239 cl::desc("enable use of redzone on AArch64"),
240 cl::init(false), cl::Hidden);
241
243 "stack-tagging-merge-settag",
244 cl::desc("merge settag instruction in function epilog"), cl::init(true),
245 cl::Hidden);
246
247static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
248 cl::desc("sort stack allocations"),
249 cl::init(true), cl::Hidden);
250
252 "homogeneous-prolog-epilog", cl::Hidden,
253 cl::desc("Emit homogeneous prologue and epilogue for the size "
254 "optimization (default = off)"));
255
256STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
257
258/// Returns how much of the incoming argument stack area (in bytes) we should
259/// clean up in an epilogue. For the C calling convention this will be 0, for
260/// guaranteed tail call conventions it can be positive (a normal return or a
261/// tail call to a function that uses less stack space for arguments) or
262/// negative (for a tail call to a function that needs more stack space than us
263/// for arguments).
268 bool IsTailCallReturn = (MBB.end() != MBBI)
270 : false;
271
272 int64_t ArgumentPopSize = 0;
273 if (IsTailCallReturn) {
274 MachineOperand &StackAdjust = MBBI->getOperand(1);
275
276 // For a tail-call in a callee-pops-arguments environment, some or all of
277 // the stack may actually be in use for the call's arguments, this is
278 // calculated during LowerCall and consumed here...
279 ArgumentPopSize = StackAdjust.getImm();
280 } else {
281 // ... otherwise the amount to pop is *all* of the argument space,
282 // conveniently stored in the MachineFunctionInfo by
283 // LowerFormalArguments. This will, of course, be zero for the C calling
284 // convention.
285 ArgumentPopSize = AFI->getArgumentStackToRestore();
286 }
287
288 return ArgumentPopSize;
289}
290
292static bool needsWinCFI(const MachineFunction &MF);
295
296/// Returns true if a homogeneous prolog or epilog code can be emitted
297/// for the size optimization. If possible, a frame helper call is injected.
298/// When Exit block is given, this check is for epilog.
299bool AArch64FrameLowering::homogeneousPrologEpilog(
300 MachineFunction &MF, MachineBasicBlock *Exit) const {
301 if (!MF.getFunction().hasMinSize())
302 return false;
304 return false;
305 if (EnableRedZone)
306 return false;
307
308 // TODO: Window is supported yet.
309 if (needsWinCFI(MF))
310 return false;
311 // TODO: SVE is not supported yet.
312 if (getSVEStackSize(MF))
313 return false;
314
315 // Bail on stack adjustment needed on return for simplicity.
316 const MachineFrameInfo &MFI = MF.getFrameInfo();
318 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
319 return false;
320 if (Exit && getArgumentStackToRestore(MF, *Exit))
321 return false;
322
323 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
324 if (AFI->hasSwiftAsyncContext())
325 return false;
326
327 // If there are an odd number of GPRs before LR and FP in the CSRs list,
328 // they will not be paired into one RegPairInfo, which is incompatible with
329 // the assumption made by the homogeneous prolog epilog pass.
330 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
331 unsigned NumGPRs = 0;
332 for (unsigned I = 0; CSRegs[I]; ++I) {
333 Register Reg = CSRegs[I];
334 if (Reg == AArch64::LR) {
335 assert(CSRegs[I + 1] == AArch64::FP);
336 if (NumGPRs % 2 != 0)
337 return false;
338 break;
339 }
340 if (AArch64::GPR64RegClass.contains(Reg))
341 ++NumGPRs;
342 }
343
344 return true;
345}
346
347/// Returns true if CSRs should be paired.
348bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
349 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
350}
351
352/// This is the biggest offset to the stack pointer we can encode in aarch64
353/// instructions (without using a separate calculation and a temp register).
354/// Note that the exception here are vector stores/loads which cannot encode any
355/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
356static const unsigned DefaultSafeSPDisplacement = 255;
357
358/// Look at each instruction that references stack frames and return the stack
359/// size limit beyond which some of these instructions will require a scratch
360/// register during their expansion later.
362 // FIXME: For now, just conservatively guestimate based on unscaled indexing
363 // range. We'll end up allocating an unnecessary spill slot a lot, but
364 // realistically that's not a big deal at this stage of the game.
365 for (MachineBasicBlock &MBB : MF) {
366 for (MachineInstr &MI : MBB) {
367 if (MI.isDebugInstr() || MI.isPseudo() ||
368 MI.getOpcode() == AArch64::ADDXri ||
369 MI.getOpcode() == AArch64::ADDSXri)
370 continue;
371
372 for (const MachineOperand &MO : MI.operands()) {
373 if (!MO.isFI())
374 continue;
375
377 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
379 return 0;
380 }
381 }
382 }
384}
385
389}
390
391/// Returns the size of the fixed object area (allocated next to sp on entry)
392/// On Win64 this may include a var args area and an UnwindHelp object for EH.
393static unsigned getFixedObjectSize(const MachineFunction &MF,
394 const AArch64FunctionInfo *AFI, bool IsWin64,
395 bool IsFunclet) {
396 if (!IsWin64 || IsFunclet) {
397 return AFI->getTailCallReservedStack();
398 } else {
399 if (AFI->getTailCallReservedStack() != 0 &&
401 Attribute::SwiftAsync))
402 report_fatal_error("cannot generate ABI-changing tail call for Win64");
403 // Var args are stored here in the primary function.
404 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
405 // To support EH funclets we allocate an UnwindHelp object
406 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
407 return AFI->getTailCallReservedStack() +
408 alignTo(VarArgsArea + UnwindHelpObject, 16);
409 }
410}
411
412/// Returns the size of the entire SVE stackframe (calleesaves + spills).
415 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
416}
417
419 if (!EnableRedZone)
420 return false;
421
422 // Don't use the red zone if the function explicitly asks us not to.
423 // This is typically used for kernel code.
424 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
425 const unsigned RedZoneSize =
427 if (!RedZoneSize)
428 return false;
429
430 const MachineFrameInfo &MFI = MF.getFrameInfo();
432 uint64_t NumBytes = AFI->getLocalStackSize();
433
434 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
435 getSVEStackSize(MF));
436}
437
438/// hasFP - Return true if the specified function should have a dedicated frame
439/// pointer register.
441 const MachineFrameInfo &MFI = MF.getFrameInfo();
442 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
443
444 // Win64 EH requires a frame pointer if funclets are present, as the locals
445 // are accessed off the frame pointer in both the parent function and the
446 // funclets.
447 if (MF.hasEHFunclets())
448 return true;
449 // Retain behavior of always omitting the FP for leaf functions when possible.
451 return true;
452 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
453 MFI.hasStackMap() || MFI.hasPatchPoint() ||
454 RegInfo->hasStackRealignment(MF))
455 return true;
456 // With large callframes around we may need to use FP to access the scavenging
457 // emergency spillslot.
458 //
459 // Unfortunately some calls to hasFP() like machine verifier ->
460 // getReservedReg() -> hasFP in the middle of global isel are too early
461 // to know the max call frame size. Hopefully conservatively returning "true"
462 // in those cases is fine.
463 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
464 if (!MFI.isMaxCallFrameSizeComputed() ||
466 return true;
467
468 return false;
469}
470
471/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
472/// not required, we reserve argument space for call sites in the function
473/// immediately on entry to the current function. This eliminates the need for
474/// add/sub sp brackets around call sites. Returns true if the call frame is
475/// included as part of the stack frame.
476bool
478 // The stack probing code for the dynamically allocated outgoing arguments
479 // area assumes that the stack is probed at the top - either by the prologue
480 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
481 // most recent variable-sized object allocation. Changing the condition here
482 // may need to be followed up by changes to the probe issuing logic.
483 return !MF.getFrameInfo().hasVarSizedObjects();
484}
485
489 const AArch64InstrInfo *TII =
490 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
491 const AArch64TargetLowering *TLI =
492 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
493 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
494 DebugLoc DL = I->getDebugLoc();
495 unsigned Opc = I->getOpcode();
496 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
497 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
498
499 if (!hasReservedCallFrame(MF)) {
500 int64_t Amount = I->getOperand(0).getImm();
501 Amount = alignTo(Amount, getStackAlign());
502 if (!IsDestroy)
503 Amount = -Amount;
504
505 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
506 // doesn't have to pop anything), then the first operand will be zero too so
507 // this adjustment is a no-op.
508 if (CalleePopAmount == 0) {
509 // FIXME: in-function stack adjustment for calls is limited to 24-bits
510 // because there's no guaranteed temporary register available.
511 //
512 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
513 // 1) For offset <= 12-bit, we use LSL #0
514 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
515 // LSL #0, and the other uses LSL #12.
516 //
517 // Most call frames will be allocated at the start of a function so
518 // this is OK, but it is a limitation that needs dealing with.
519 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
520
521 if (TLI->hasInlineStackProbe(MF) &&
523 // When stack probing is enabled, the decrement of SP may need to be
524 // probed. We only need to do this if the call site needs 1024 bytes of
525 // space or more, because a region smaller than that is allowed to be
526 // unprobed at an ABI boundary. We rely on the fact that SP has been
527 // probed exactly at this point, either by the prologue or most recent
528 // dynamic allocation.
530 "non-reserved call frame without var sized objects?");
531 Register ScratchReg =
532 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
533 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
534 } else {
535 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
536 StackOffset::getFixed(Amount), TII);
537 }
538 }
539 } else if (CalleePopAmount != 0) {
540 // If the calling convention demands that the callee pops arguments from the
541 // stack, we want to add it back if we have a reserved call frame.
542 assert(CalleePopAmount < 0xffffff && "call frame too large");
543 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
544 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
545 }
546 return MBB.erase(I);
547}
548
549void AArch64FrameLowering::emitCalleeSavedGPRLocations(
552 MachineFrameInfo &MFI = MF.getFrameInfo();
553
554 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
555 if (CSI.empty())
556 return;
557
558 const TargetSubtargetInfo &STI = MF.getSubtarget();
559 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
560 const TargetInstrInfo &TII = *STI.getInstrInfo();
562
563 for (const auto &Info : CSI) {
564 if (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector)
565 continue;
566
567 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
568 unsigned DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
569
570 int64_t Offset =
571 MFI.getObjectOffset(Info.getFrameIdx()) - getOffsetOfLocalArea();
572 unsigned CFIIndex = MF.addFrameInst(
573 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
574 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
575 .addCFIIndex(CFIIndex)
577 }
578}
579
580void AArch64FrameLowering::emitCalleeSavedSVELocations(
583 MachineFrameInfo &MFI = MF.getFrameInfo();
584
585 // Add callee saved registers to move list.
586 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
587 if (CSI.empty())
588 return;
589
590 const TargetSubtargetInfo &STI = MF.getSubtarget();
591 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
592 const TargetInstrInfo &TII = *STI.getInstrInfo();
595
596 for (const auto &Info : CSI) {
597 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
598 continue;
599
600 // Not all unwinders may know about SVE registers, so assume the lowest
601 // common demoninator.
602 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
603 unsigned Reg = Info.getReg();
604 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
605 continue;
606
608 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
610
611 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
612 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
613 .addCFIIndex(CFIIndex)
615 }
616}
617
621 unsigned DwarfReg) {
622 unsigned CFIIndex =
623 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
624 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
625}
626
628 MachineBasicBlock &MBB) const {
629
631 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
632 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
633 const auto &TRI =
634 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
635 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
636
637 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
638 DebugLoc DL;
639
640 // Reset the CFA to `SP + 0`.
642 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
643 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
644 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
645
646 // Flip the RA sign state.
647 if (MFI.shouldSignReturnAddress(MF)) {
649 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
650 }
651
652 // Shadow call stack uses X18, reset it.
653 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
654 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
655 TRI.getDwarfRegNum(AArch64::X18, true));
656
657 // Emit .cfi_same_value for callee-saved registers.
658 const std::vector<CalleeSavedInfo> &CSI =
660 for (const auto &Info : CSI) {
661 unsigned Reg = Info.getReg();
662 if (!TRI.regNeedsCFI(Reg, Reg))
663 continue;
664 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
665 TRI.getDwarfRegNum(Reg, true));
666 }
667}
668
671 bool SVE) {
673 MachineFrameInfo &MFI = MF.getFrameInfo();
674
675 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
676 if (CSI.empty())
677 return;
678
679 const TargetSubtargetInfo &STI = MF.getSubtarget();
680 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
681 const TargetInstrInfo &TII = *STI.getInstrInfo();
683
684 for (const auto &Info : CSI) {
685 if (SVE !=
686 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
687 continue;
688
689 unsigned Reg = Info.getReg();
690 if (SVE &&
691 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
692 continue;
693
694 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
695 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
696 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
697 .addCFIIndex(CFIIndex)
699 }
700}
701
702void AArch64FrameLowering::emitCalleeSavedGPRRestores(
705}
706
707void AArch64FrameLowering::emitCalleeSavedSVERestores(
710}
711
712// Return the maximum possible number of bytes for `Size` due to the
713// architectural limit on the size of a SVE register.
714static int64_t upperBound(StackOffset Size) {
715 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
716 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
717}
718
719void AArch64FrameLowering::allocateStackSpace(
721 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
722 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
723 bool FollowupAllocs) const {
724
725 if (!AllocSize)
726 return;
727
728 DebugLoc DL;
730 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
731 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
733 const MachineFrameInfo &MFI = MF.getFrameInfo();
734
735 const int64_t MaxAlign = MFI.getMaxAlign().value();
736 const uint64_t AndMask = ~(MaxAlign - 1);
737
738 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
739 Register TargetReg = RealignmentPadding
741 : AArch64::SP;
742 // SUB Xd/SP, SP, AllocSize
743 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
744 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
745 EmitCFI, InitialOffset);
746
747 if (RealignmentPadding) {
748 // AND SP, X9, 0b11111...0000
749 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
750 .addReg(TargetReg, RegState::Kill)
753 AFI.setStackRealigned(true);
754
755 // No need for SEH instructions here; if we're realigning the stack,
756 // we've set a frame pointer and already finished the SEH prologue.
757 assert(!NeedsWinCFI);
758 }
759 return;
760 }
761
762 //
763 // Stack probing allocation.
764 //
765
766 // Fixed length allocation. If we don't need to re-align the stack and don't
767 // have SVE objects, we can use a more efficient sequence for stack probing.
768 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
770 assert(ScratchReg != AArch64::NoRegister);
771 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
772 .addDef(ScratchReg)
773 .addImm(AllocSize.getFixed())
774 .addImm(InitialOffset.getFixed())
775 .addImm(InitialOffset.getScalable());
776 // The fixed allocation may leave unprobed bytes at the top of the
777 // stack. If we have subsequent alocation (e.g. if we have variable-sized
778 // objects), we need to issue an extra probe, so these allocations start in
779 // a known state.
780 if (FollowupAllocs) {
781 // STR XZR, [SP]
782 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
783 .addReg(AArch64::XZR)
784 .addReg(AArch64::SP)
785 .addImm(0)
787 }
788
789 return;
790 }
791
792 // Variable length allocation.
793
794 // If the (unknown) allocation size cannot exceed the probe size, decrement
795 // the stack pointer right away.
796 int64_t ProbeSize = AFI.getStackProbeSize();
797 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
798 Register ScratchReg = RealignmentPadding
800 : AArch64::SP;
801 assert(ScratchReg != AArch64::NoRegister);
802 // SUB Xd, SP, AllocSize
803 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
804 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
805 EmitCFI, InitialOffset);
806 if (RealignmentPadding) {
807 // AND SP, Xn, 0b11111...0000
808 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
809 .addReg(ScratchReg, RegState::Kill)
812 AFI.setStackRealigned(true);
813 }
814 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
816 // STR XZR, [SP]
817 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
818 .addReg(AArch64::XZR)
819 .addReg(AArch64::SP)
820 .addImm(0)
822 }
823 return;
824 }
825
826 // Emit a variable-length allocation probing loop.
827 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
828 // each of them guaranteed to adjust the stack by less than the probe size.
830 assert(TargetReg != AArch64::NoRegister);
831 // SUB Xd, SP, AllocSize
832 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
833 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
834 EmitCFI, InitialOffset);
835 if (RealignmentPadding) {
836 // AND Xn, Xn, 0b11111...0000
837 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
838 .addReg(TargetReg, RegState::Kill)
841 }
842
843 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
844 .addReg(TargetReg);
845 if (EmitCFI) {
846 // Set the CFA register back to SP.
847 unsigned Reg =
848 Subtarget.getRegisterInfo()->getDwarfRegNum(AArch64::SP, true);
849 unsigned CFIIndex =
851 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
852 .addCFIIndex(CFIIndex)
854 }
855 if (RealignmentPadding)
856 AFI.setStackRealigned(true);
857}
858
859static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
860 switch (Reg.id()) {
861 default:
862 // The called routine is expected to preserve r19-r28
863 // r29 and r30 are used as frame pointer and link register resp.
864 return 0;
865
866 // GPRs
867#define CASE(n) \
868 case AArch64::W##n: \
869 case AArch64::X##n: \
870 return AArch64::X##n
871 CASE(0);
872 CASE(1);
873 CASE(2);
874 CASE(3);
875 CASE(4);
876 CASE(5);
877 CASE(6);
878 CASE(7);
879 CASE(8);
880 CASE(9);
881 CASE(10);
882 CASE(11);
883 CASE(12);
884 CASE(13);
885 CASE(14);
886 CASE(15);
887 CASE(16);
888 CASE(17);
889 CASE(18);
890#undef CASE
891
892 // FPRs
893#define CASE(n) \
894 case AArch64::B##n: \
895 case AArch64::H##n: \
896 case AArch64::S##n: \
897 case AArch64::D##n: \
898 case AArch64::Q##n: \
899 return HasSVE ? AArch64::Z##n : AArch64::Q##n
900 CASE(0);
901 CASE(1);
902 CASE(2);
903 CASE(3);
904 CASE(4);
905 CASE(5);
906 CASE(6);
907 CASE(7);
908 CASE(8);
909 CASE(9);
910 CASE(10);
911 CASE(11);
912 CASE(12);
913 CASE(13);
914 CASE(14);
915 CASE(15);
916 CASE(16);
917 CASE(17);
918 CASE(18);
919 CASE(19);
920 CASE(20);
921 CASE(21);
922 CASE(22);
923 CASE(23);
924 CASE(24);
925 CASE(25);
926 CASE(26);
927 CASE(27);
928 CASE(28);
929 CASE(29);
930 CASE(30);
931 CASE(31);
932#undef CASE
933 }
934}
935
936void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
937 MachineBasicBlock &MBB) const {
938 // Insertion point.
940
941 // Fake a debug loc.
942 DebugLoc DL;
943 if (MBBI != MBB.end())
944 DL = MBBI->getDebugLoc();
945
946 const MachineFunction &MF = *MBB.getParent();
949
950 BitVector GPRsToZero(TRI.getNumRegs());
951 BitVector FPRsToZero(TRI.getNumRegs());
952 bool HasSVE = STI.hasSVE();
953 for (MCRegister Reg : RegsToZero.set_bits()) {
954 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
955 // For GPRs, we only care to clear out the 64-bit register.
956 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
957 GPRsToZero.set(XReg);
958 } else if (AArch64::FPR128RegClass.contains(Reg) ||
959 AArch64::FPR64RegClass.contains(Reg) ||
960 AArch64::FPR32RegClass.contains(Reg) ||
961 AArch64::FPR16RegClass.contains(Reg) ||
962 AArch64::FPR8RegClass.contains(Reg)) {
963 // For FPRs,
964 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
965 FPRsToZero.set(XReg);
966 }
967 }
968
969 const AArch64InstrInfo &TII = *STI.getInstrInfo();
970
971 // Zero out GPRs.
972 for (MCRegister Reg : GPRsToZero.set_bits())
973 TII.buildClearRegister(Reg, MBB, MBBI, DL);
974
975 // Zero out FP/vector registers.
976 for (MCRegister Reg : FPRsToZero.set_bits())
977 TII.buildClearRegister(Reg, MBB, MBBI, DL);
978
979 if (HasSVE) {
980 for (MCRegister PReg :
981 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
982 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
983 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
984 AArch64::P15}) {
985 if (RegsToZero[PReg])
986 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
987 }
988 }
989}
990
992 const MachineBasicBlock &MBB) {
993 const MachineFunction *MF = MBB.getParent();
994 LiveRegs.addLiveIns(MBB);
995 // Mark callee saved registers as used so we will not choose them.
996 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
997 for (unsigned i = 0; CSRegs[i]; ++i)
998 LiveRegs.addReg(CSRegs[i]);
999}
1000
1001// Find a scratch register that we can use at the start of the prologue to
1002// re-align the stack pointer. We avoid using callee-save registers since they
1003// may appear to be free when this is called from canUseAsPrologue (during
1004// shrink wrapping), but then no longer be free when this is called from
1005// emitPrologue.
1006//
1007// FIXME: This is a bit conservative, since in the above case we could use one
1008// of the callee-save registers as a scratch temp to re-align the stack pointer,
1009// but we would then have to make sure that we were in fact saving at least one
1010// callee-save register in the prologue, which is additional complexity that
1011// doesn't seem worth the benefit.
1013 MachineFunction *MF = MBB->getParent();
1014
1015 // If MBB is an entry block, use X9 as the scratch register
1016 if (&MF->front() == MBB)
1017 return AArch64::X9;
1018
1019 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1020 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1021 LivePhysRegs LiveRegs(TRI);
1022 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1023
1024 // Prefer X9 since it was historically used for the prologue scratch reg.
1025 const MachineRegisterInfo &MRI = MF->getRegInfo();
1026 if (LiveRegs.available(MRI, AArch64::X9))
1027 return AArch64::X9;
1028
1029 for (unsigned Reg : AArch64::GPR64RegClass) {
1030 if (LiveRegs.available(MRI, Reg))
1031 return Reg;
1032 }
1033 return AArch64::NoRegister;
1034}
1035
1037 const MachineBasicBlock &MBB) const {
1038 const MachineFunction *MF = MBB.getParent();
1039 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1040 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1041 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1042 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1044
1045 if (AFI->hasSwiftAsyncContext()) {
1046 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1047 const MachineRegisterInfo &MRI = MF->getRegInfo();
1048 LivePhysRegs LiveRegs(TRI);
1049 getLiveRegsForEntryMBB(LiveRegs, MBB);
1050 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1051 // available.
1052 if (!LiveRegs.available(MRI, AArch64::X16) ||
1053 !LiveRegs.available(MRI, AArch64::X17))
1054 return false;
1055 }
1056
1057 // Certain stack probing sequences might clobber flags, then we can't use
1058 // the block as a prologue if the flags register is a live-in.
1060 MBB.isLiveIn(AArch64::NZCV))
1061 return false;
1062
1063 // Don't need a scratch register if we're not going to re-align the stack or
1064 // emit stack probes.
1065 if (!RegInfo->hasStackRealignment(*MF) && !TLI->hasInlineStackProbe(*MF))
1066 return true;
1067 // Otherwise, we can use any block as long as it has a scratch register
1068 // available.
1069 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
1070}
1071
1073 uint64_t StackSizeInBytes) {
1074 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1076 // TODO: When implementing stack protectors, take that into account
1077 // for the probe threshold.
1078 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
1079 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
1080}
1081
1082static bool needsWinCFI(const MachineFunction &MF) {
1083 const Function &F = MF.getFunction();
1084 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1085 F.needsUnwindTableEntry();
1086}
1087
1088bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1089 MachineFunction &MF, uint64_t StackBumpBytes) const {
1091 const MachineFrameInfo &MFI = MF.getFrameInfo();
1092 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1093 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1094 if (homogeneousPrologEpilog(MF))
1095 return false;
1096
1097 if (AFI->getLocalStackSize() == 0)
1098 return false;
1099
1100 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1101 // (to force a stp with predecrement) to match the packed unwind format,
1102 // provided that there actually are any callee saved registers to merge the
1103 // decrement with.
1104 // This is potentially marginally slower, but allows using the packed
1105 // unwind format for functions that both have a local area and callee saved
1106 // registers. Using the packed unwind format notably reduces the size of
1107 // the unwind info.
1108 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1109 MF.getFunction().hasOptSize())
1110 return false;
1111
1112 // 512 is the maximum immediate for stp/ldp that will be used for
1113 // callee-save save/restores
1114 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1115 return false;
1116
1117 if (MFI.hasVarSizedObjects())
1118 return false;
1119
1120 if (RegInfo->hasStackRealignment(MF))
1121 return false;
1122
1123 // This isn't strictly necessary, but it simplifies things a bit since the
1124 // current RedZone handling code assumes the SP is adjusted by the
1125 // callee-save save/restore code.
1126 if (canUseRedZone(MF))
1127 return false;
1128
1129 // When there is an SVE area on the stack, always allocate the
1130 // callee-saves and spills/locals separately.
1131 if (getSVEStackSize(MF))
1132 return false;
1133
1134 return true;
1135}
1136
1137bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1138 MachineBasicBlock &MBB, unsigned StackBumpBytes) const {
1139 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1140 return false;
1141
1142 if (MBB.empty())
1143 return true;
1144
1145 // Disable combined SP bump if the last instruction is an MTE tag store. It
1146 // is almost always better to merge SP adjustment into those instructions.
1149 while (LastI != Begin) {
1150 --LastI;
1151 if (LastI->isTransient())
1152 continue;
1153 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1154 break;
1155 }
1156 switch (LastI->getOpcode()) {
1157 case AArch64::STGloop:
1158 case AArch64::STZGloop:
1159 case AArch64::STGi:
1160 case AArch64::STZGi:
1161 case AArch64::ST2Gi:
1162 case AArch64::STZ2Gi:
1163 return false;
1164 default:
1165 return true;
1166 }
1167 llvm_unreachable("unreachable");
1168}
1169
1170// Given a load or a store instruction, generate an appropriate unwinding SEH
1171// code on Windows.
1173 const TargetInstrInfo &TII,
1174 MachineInstr::MIFlag Flag) {
1175 unsigned Opc = MBBI->getOpcode();
1177 MachineFunction &MF = *MBB->getParent();
1178 DebugLoc DL = MBBI->getDebugLoc();
1179 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1180 int Imm = MBBI->getOperand(ImmIdx).getImm();
1182 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1183 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1184
1185 switch (Opc) {
1186 default:
1187 llvm_unreachable("No SEH Opcode for this instruction");
1188 case AArch64::LDPDpost:
1189 Imm = -Imm;
1190 [[fallthrough]];
1191 case AArch64::STPDpre: {
1192 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1193 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1194 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1195 .addImm(Reg0)
1196 .addImm(Reg1)
1197 .addImm(Imm * 8)
1198 .setMIFlag(Flag);
1199 break;
1200 }
1201 case AArch64::LDPXpost:
1202 Imm = -Imm;
1203 [[fallthrough]];
1204 case AArch64::STPXpre: {
1205 Register Reg0 = MBBI->getOperand(1).getReg();
1206 Register Reg1 = MBBI->getOperand(2).getReg();
1207 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1208 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1209 .addImm(Imm * 8)
1210 .setMIFlag(Flag);
1211 else
1212 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1213 .addImm(RegInfo->getSEHRegNum(Reg0))
1214 .addImm(RegInfo->getSEHRegNum(Reg1))
1215 .addImm(Imm * 8)
1216 .setMIFlag(Flag);
1217 break;
1218 }
1219 case AArch64::LDRDpost:
1220 Imm = -Imm;
1221 [[fallthrough]];
1222 case AArch64::STRDpre: {
1223 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1224 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1225 .addImm(Reg)
1226 .addImm(Imm)
1227 .setMIFlag(Flag);
1228 break;
1229 }
1230 case AArch64::LDRXpost:
1231 Imm = -Imm;
1232 [[fallthrough]];
1233 case AArch64::STRXpre: {
1234 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1235 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1236 .addImm(Reg)
1237 .addImm(Imm)
1238 .setMIFlag(Flag);
1239 break;
1240 }
1241 case AArch64::STPDi:
1242 case AArch64::LDPDi: {
1243 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1244 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1245 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1246 .addImm(Reg0)
1247 .addImm(Reg1)
1248 .addImm(Imm * 8)
1249 .setMIFlag(Flag);
1250 break;
1251 }
1252 case AArch64::STPXi:
1253 case AArch64::LDPXi: {
1254 Register Reg0 = MBBI->getOperand(0).getReg();
1255 Register Reg1 = MBBI->getOperand(1).getReg();
1256 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1257 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1258 .addImm(Imm * 8)
1259 .setMIFlag(Flag);
1260 else
1261 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1262 .addImm(RegInfo->getSEHRegNum(Reg0))
1263 .addImm(RegInfo->getSEHRegNum(Reg1))
1264 .addImm(Imm * 8)
1265 .setMIFlag(Flag);
1266 break;
1267 }
1268 case AArch64::STRXui:
1269 case AArch64::LDRXui: {
1270 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1271 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1272 .addImm(Reg)
1273 .addImm(Imm * 8)
1274 .setMIFlag(Flag);
1275 break;
1276 }
1277 case AArch64::STRDui:
1278 case AArch64::LDRDui: {
1279 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1280 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1281 .addImm(Reg)
1282 .addImm(Imm * 8)
1283 .setMIFlag(Flag);
1284 break;
1285 }
1286 case AArch64::STPQi:
1287 case AArch64::LDPQi: {
1288 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1289 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1290 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1291 .addImm(Reg0)
1292 .addImm(Reg1)
1293 .addImm(Imm * 16)
1294 .setMIFlag(Flag);
1295 break;
1296 }
1297 case AArch64::LDPQpost:
1298 Imm = -Imm;
1299 [[fallthrough]];
1300 case AArch64::STPQpre: {
1301 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1302 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1303 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1304 .addImm(Reg0)
1305 .addImm(Reg1)
1306 .addImm(Imm * 16)
1307 .setMIFlag(Flag);
1308 break;
1309 }
1310 }
1311 auto I = MBB->insertAfter(MBBI, MIB);
1312 return I;
1313}
1314
1315// Fix up the SEH opcode associated with the save/restore instruction.
1317 unsigned LocalStackSize) {
1318 MachineOperand *ImmOpnd = nullptr;
1319 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1320 switch (MBBI->getOpcode()) {
1321 default:
1322 llvm_unreachable("Fix the offset in the SEH instruction");
1323 case AArch64::SEH_SaveFPLR:
1324 case AArch64::SEH_SaveRegP:
1325 case AArch64::SEH_SaveReg:
1326 case AArch64::SEH_SaveFRegP:
1327 case AArch64::SEH_SaveFReg:
1328 case AArch64::SEH_SaveAnyRegQP:
1329 case AArch64::SEH_SaveAnyRegQPX:
1330 ImmOpnd = &MBBI->getOperand(ImmIdx);
1331 break;
1332 }
1333 if (ImmOpnd)
1334 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1335}
1336
1337// Convert callee-save register save/restore instruction to do stack pointer
1338// decrement/increment to allocate/deallocate the callee-save stack area by
1339// converting store/load to use pre/post increment version.
1342 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1343 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1345 int CFAOffset = 0) {
1346 unsigned NewOpc;
1347 switch (MBBI->getOpcode()) {
1348 default:
1349 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1350 case AArch64::STPXi:
1351 NewOpc = AArch64::STPXpre;
1352 break;
1353 case AArch64::STPDi:
1354 NewOpc = AArch64::STPDpre;
1355 break;
1356 case AArch64::STPQi:
1357 NewOpc = AArch64::STPQpre;
1358 break;
1359 case AArch64::STRXui:
1360 NewOpc = AArch64::STRXpre;
1361 break;
1362 case AArch64::STRDui:
1363 NewOpc = AArch64::STRDpre;
1364 break;
1365 case AArch64::STRQui:
1366 NewOpc = AArch64::STRQpre;
1367 break;
1368 case AArch64::LDPXi:
1369 NewOpc = AArch64::LDPXpost;
1370 break;
1371 case AArch64::LDPDi:
1372 NewOpc = AArch64::LDPDpost;
1373 break;
1374 case AArch64::LDPQi:
1375 NewOpc = AArch64::LDPQpost;
1376 break;
1377 case AArch64::LDRXui:
1378 NewOpc = AArch64::LDRXpost;
1379 break;
1380 case AArch64::LDRDui:
1381 NewOpc = AArch64::LDRDpost;
1382 break;
1383 case AArch64::LDRQui:
1384 NewOpc = AArch64::LDRQpost;
1385 break;
1386 }
1387 // Get rid of the SEH code associated with the old instruction.
1388 if (NeedsWinCFI) {
1389 auto SEH = std::next(MBBI);
1391 SEH->eraseFromParent();
1392 }
1393
1394 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1395 int64_t MinOffset, MaxOffset;
1396 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1397 NewOpc, Scale, Width, MinOffset, MaxOffset);
1398 (void)Success;
1399 assert(Success && "unknown load/store opcode");
1400
1401 // If the first store isn't right where we want SP then we can't fold the
1402 // update in so create a normal arithmetic instruction instead.
1403 MachineFunction &MF = *MBB.getParent();
1404 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1405 CSStackSizeInc < MinOffset || CSStackSizeInc > MaxOffset) {
1406 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1407 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1408 false, false, nullptr, EmitCFI,
1409 StackOffset::getFixed(CFAOffset));
1410
1411 return std::prev(MBBI);
1412 }
1413
1414 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1415 MIB.addReg(AArch64::SP, RegState::Define);
1416
1417 // Copy all operands other than the immediate offset.
1418 unsigned OpndIdx = 0;
1419 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1420 ++OpndIdx)
1421 MIB.add(MBBI->getOperand(OpndIdx));
1422
1423 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1424 "Unexpected immediate offset in first/last callee-save save/restore "
1425 "instruction!");
1426 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1427 "Unexpected base register in callee-save save/restore instruction!");
1428 assert(CSStackSizeInc % Scale == 0);
1429 MIB.addImm(CSStackSizeInc / (int)Scale);
1430
1431 MIB.setMIFlags(MBBI->getFlags());
1432 MIB.setMemRefs(MBBI->memoperands());
1433
1434 // Generate a new SEH code that corresponds to the new instruction.
1435 if (NeedsWinCFI) {
1436 *HasWinCFI = true;
1437 InsertSEH(*MIB, *TII, FrameFlag);
1438 }
1439
1440 if (EmitCFI) {
1441 unsigned CFIIndex = MF.addFrameInst(
1442 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1443 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1444 .addCFIIndex(CFIIndex)
1445 .setMIFlags(FrameFlag);
1446 }
1447
1448 return std::prev(MBB.erase(MBBI));
1449}
1450
1451// Fixup callee-save register save/restore instructions to take into account
1452// combined SP bump by adding the local stack size to the stack offsets.
1454 uint64_t LocalStackSize,
1455 bool NeedsWinCFI,
1456 bool *HasWinCFI) {
1458 return;
1459
1460 unsigned Opc = MI.getOpcode();
1461 unsigned Scale;
1462 switch (Opc) {
1463 case AArch64::STPXi:
1464 case AArch64::STRXui:
1465 case AArch64::STPDi:
1466 case AArch64::STRDui:
1467 case AArch64::LDPXi:
1468 case AArch64::LDRXui:
1469 case AArch64::LDPDi:
1470 case AArch64::LDRDui:
1471 Scale = 8;
1472 break;
1473 case AArch64::STPQi:
1474 case AArch64::STRQui:
1475 case AArch64::LDPQi:
1476 case AArch64::LDRQui:
1477 Scale = 16;
1478 break;
1479 default:
1480 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1481 }
1482
1483 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1484 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1485 "Unexpected base register in callee-save save/restore instruction!");
1486 // Last operand is immediate offset that needs fixing.
1487 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1488 // All generated opcodes have scaled offsets.
1489 assert(LocalStackSize % Scale == 0);
1490 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1491
1492 if (NeedsWinCFI) {
1493 *HasWinCFI = true;
1494 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1495 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1497 "Expecting a SEH instruction");
1498 fixupSEHOpcode(MBBI, LocalStackSize);
1499 }
1500}
1501
1502static bool isTargetWindows(const MachineFunction &MF) {
1504}
1505
1506// Convenience function to determine whether I is an SVE callee save.
1508 switch (I->getOpcode()) {
1509 default:
1510 return false;
1511 case AArch64::PTRUE_C_B:
1512 case AArch64::LD1B_2Z_IMM:
1513 case AArch64::ST1B_2Z_IMM:
1514 case AArch64::STR_ZXI:
1515 case AArch64::STR_PXI:
1516 case AArch64::LDR_ZXI:
1517 case AArch64::LDR_PXI:
1518 return I->getFlag(MachineInstr::FrameSetup) ||
1519 I->getFlag(MachineInstr::FrameDestroy);
1520 }
1521}
1522
1524 MachineFunction &MF,
1527 const DebugLoc &DL, bool NeedsWinCFI,
1528 bool NeedsUnwindInfo) {
1529 // Shadow call stack prolog: str x30, [x18], #8
1530 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1531 .addReg(AArch64::X18, RegState::Define)
1532 .addReg(AArch64::LR)
1533 .addReg(AArch64::X18)
1534 .addImm(8)
1536
1537 // This instruction also makes x18 live-in to the entry block.
1538 MBB.addLiveIn(AArch64::X18);
1539
1540 if (NeedsWinCFI)
1541 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1543
1544 if (NeedsUnwindInfo) {
1545 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1546 // x18 when unwinding past this frame.
1547 static const char CFIInst[] = {
1548 dwarf::DW_CFA_val_expression,
1549 18, // register
1550 2, // length
1551 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1552 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1553 };
1554 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1555 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1556 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1557 .addCFIIndex(CFIIndex)
1559 }
1560}
1561
1563 MachineFunction &MF,
1566 const DebugLoc &DL) {
1567 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1568 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1569 .addReg(AArch64::X18, RegState::Define)
1570 .addReg(AArch64::LR, RegState::Define)
1571 .addReg(AArch64::X18)
1572 .addImm(-8)
1574
1576 unsigned CFIIndex =
1578 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1579 .addCFIIndex(CFIIndex)
1581 }
1582}
1583
1584// Define the current CFA rule to use the provided FP.
1587 const DebugLoc &DL, unsigned FixedObject) {
1590 const TargetInstrInfo *TII = STI.getInstrInfo();
1592
1593 const int OffsetToFirstCalleeSaveFromFP =
1596 Register FramePtr = TRI->getFrameRegister(MF);
1597 unsigned Reg = TRI->getDwarfRegNum(FramePtr, true);
1598 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1599 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1600 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1601 .addCFIIndex(CFIIndex)
1603}
1604
1605#ifndef NDEBUG
1606/// Collect live registers from the end of \p MI's parent up to (including) \p
1607/// MI in \p LiveRegs.
1609 LivePhysRegs &LiveRegs) {
1610
1611 MachineBasicBlock &MBB = *MI.getParent();
1612 LiveRegs.addLiveOuts(MBB);
1613 for (const MachineInstr &MI :
1614 reverse(make_range(MI.getIterator(), MBB.instr_end())))
1615 LiveRegs.stepBackward(MI);
1616}
1617#endif
1618
1620 MachineBasicBlock &MBB) const {
1622 const MachineFrameInfo &MFI = MF.getFrameInfo();
1623 const Function &F = MF.getFunction();
1624 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1625 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1626 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1627
1628 MachineModuleInfo &MMI = MF.getMMI();
1630 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1631 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1632 bool HasFP = hasFP(MF);
1633 bool NeedsWinCFI = needsWinCFI(MF);
1634 bool HasWinCFI = false;
1635 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1636
1638#ifndef NDEBUG
1640 // Collect live register from the end of MBB up to the start of the existing
1641 // frame setup instructions.
1642 MachineBasicBlock::iterator NonFrameStart = MBB.begin();
1643 while (NonFrameStart != End &&
1644 NonFrameStart->getFlag(MachineInstr::FrameSetup))
1645 ++NonFrameStart;
1646
1647 LivePhysRegs LiveRegs(*TRI);
1648 if (NonFrameStart != MBB.end()) {
1649 getLivePhysRegsUpTo(*NonFrameStart, *TRI, LiveRegs);
1650 // Ignore registers used for stack management for now.
1651 LiveRegs.removeReg(AArch64::SP);
1652 LiveRegs.removeReg(AArch64::X19);
1653 LiveRegs.removeReg(AArch64::FP);
1654 LiveRegs.removeReg(AArch64::LR);
1655 }
1656
1657 auto VerifyClobberOnExit = make_scope_exit([&]() {
1658 if (NonFrameStart == MBB.end())
1659 return;
1660 // Check if any of the newly instructions clobber any of the live registers.
1661 for (MachineInstr &MI :
1662 make_range(MBB.instr_begin(), NonFrameStart->getIterator())) {
1663 for (auto &Op : MI.operands())
1664 if (Op.isReg() && Op.isDef())
1665 assert(!LiveRegs.contains(Op.getReg()) &&
1666 "live register clobbered by inserted prologue instructions");
1667 }
1668 });
1669#endif
1670
1671 bool IsFunclet = MBB.isEHFuncletEntry();
1672
1673 // At this point, we're going to decide whether or not the function uses a
1674 // redzone. In most cases, the function doesn't have a redzone so let's
1675 // assume that's false and set it to true in the case that there's a redzone.
1676 AFI->setHasRedZone(false);
1677
1678 // Debug location must be unknown since the first debug location is used
1679 // to determine the end of the prologue.
1680 DebugLoc DL;
1681
1682 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1683 if (MFnI.needsShadowCallStackPrologueEpilogue(MF))
1684 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1685 MFnI.needsDwarfUnwindInfo(MF));
1686
1687 if (MFnI.shouldSignReturnAddress(MF)) {
1688 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1690 if (NeedsWinCFI)
1691 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1692 }
1693
1694 if (EmitCFI && MFnI.isMTETagged()) {
1695 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1697 }
1698
1699 // We signal the presence of a Swift extended frame to external tools by
1700 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1701 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1702 // bits so that is still true.
1703 if (HasFP && AFI->hasSwiftAsyncContext()) {
1706 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1707 // The special symbol below is absolute and has a *value* that can be
1708 // combined with the frame pointer to signal an extended frame.
1709 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1710 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1712 if (NeedsWinCFI) {
1713 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1715 HasWinCFI = true;
1716 }
1717 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1718 .addUse(AArch64::FP)
1719 .addUse(AArch64::X16)
1720 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1721 if (NeedsWinCFI) {
1722 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1724 HasWinCFI = true;
1725 }
1726 break;
1727 }
1728 [[fallthrough]];
1729
1731 // ORR x29, x29, #0x1000_0000_0000_0000
1732 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1733 .addUse(AArch64::FP)
1734 .addImm(0x1100)
1736 if (NeedsWinCFI) {
1737 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1739 HasWinCFI = true;
1740 }
1741 break;
1742
1744 break;
1745 }
1746 }
1747
1748 // All calls are tail calls in GHC calling conv, and functions have no
1749 // prologue/epilogue.
1751 return;
1752
1753 // Set tagged base pointer to the requested stack slot.
1754 // Ideally it should match SP value after prologue.
1755 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1756 if (TBPI)
1758 else
1760
1761 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1762
1763 // getStackSize() includes all the locals in its size calculation. We don't
1764 // include these locals when computing the stack size of a funclet, as they
1765 // are allocated in the parent's stack frame and accessed via the frame
1766 // pointer from the funclet. We only save the callee saved registers in the
1767 // funclet, which are really the callee saved registers of the parent
1768 // function, including the funclet.
1769 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
1770 : MFI.getStackSize();
1771 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1772 assert(!HasFP && "unexpected function without stack frame but with FP");
1773 assert(!SVEStackSize &&
1774 "unexpected function without stack frame but with SVE objects");
1775 // All of the stack allocation is for locals.
1776 AFI->setLocalStackSize(NumBytes);
1777 if (!NumBytes)
1778 return;
1779 // REDZONE: If the stack size is less than 128 bytes, we don't need
1780 // to actually allocate.
1781 if (canUseRedZone(MF)) {
1782 AFI->setHasRedZone(true);
1783 ++NumRedZoneFunctions;
1784 } else {
1785 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1786 StackOffset::getFixed(-NumBytes), TII,
1787 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1788 if (EmitCFI) {
1789 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1790 MCSymbol *FrameLabel = MMI.getContext().createTempSymbol();
1791 // Encode the stack size of the leaf function.
1792 unsigned CFIIndex = MF.addFrameInst(
1793 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1794 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1795 .addCFIIndex(CFIIndex)
1797 }
1798 }
1799
1800 if (NeedsWinCFI) {
1801 HasWinCFI = true;
1802 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1804 }
1805
1806 return;
1807 }
1808
1809 bool IsWin64 =
1811 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1812
1813 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1814 // All of the remaining stack allocations are for locals.
1815 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1816 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1817 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1818 if (CombineSPBump) {
1819 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1820 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1821 StackOffset::getFixed(-NumBytes), TII,
1822 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1823 EmitAsyncCFI);
1824 NumBytes = 0;
1825 } else if (HomPrologEpilog) {
1826 // Stack has been already adjusted.
1827 NumBytes -= PrologueSaveSize;
1828 } else if (PrologueSaveSize != 0) {
1830 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1831 EmitAsyncCFI);
1832 NumBytes -= PrologueSaveSize;
1833 }
1834 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1835
1836 // Move past the saves of the callee-saved registers, fixing up the offsets
1837 // and pre-inc if we decided to combine the callee-save and local stack
1838 // pointer bump above.
1839 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1841 if (CombineSPBump)
1843 NeedsWinCFI, &HasWinCFI);
1844 ++MBBI;
1845 }
1846
1847 // For funclets the FP belongs to the containing function.
1848 if (!IsFunclet && HasFP) {
1849 // Only set up FP if we actually need to.
1850 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1851
1852 if (CombineSPBump)
1853 FPOffset += AFI->getLocalStackSize();
1854
1855 if (AFI->hasSwiftAsyncContext()) {
1856 // Before we update the live FP we have to ensure there's a valid (or
1857 // null) asynchronous context in its slot just before FP in the frame
1858 // record, so store it now.
1859 const auto &Attrs = MF.getFunction().getAttributes();
1860 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1861 if (HaveInitialContext)
1862 MBB.addLiveIn(AArch64::X22);
1863 Register Reg = HaveInitialContext ? AArch64::X22 : AArch64::XZR;
1864 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1865 .addUse(Reg)
1866 .addUse(AArch64::SP)
1867 .addImm(FPOffset - 8)
1869 if (NeedsWinCFI) {
1870 // WinCFI and arm64e, where StoreSwiftAsyncContext is expanded
1871 // to multiple instructions, should be mutually-exclusive.
1872 assert(Subtarget.getTargetTriple().getArchName() != "arm64e");
1873 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1875 HasWinCFI = true;
1876 }
1877 }
1878
1879 if (HomPrologEpilog) {
1880 auto Prolog = MBBI;
1881 --Prolog;
1882 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
1883 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
1884 } else {
1885 // Issue sub fp, sp, FPOffset or
1886 // mov fp,sp when FPOffset is zero.
1887 // Note: All stores of callee-saved registers are marked as "FrameSetup".
1888 // This code marks the instruction(s) that set the FP also.
1889 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
1890 StackOffset::getFixed(FPOffset), TII,
1891 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1892 if (NeedsWinCFI && HasWinCFI) {
1893 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1895 // After setting up the FP, the rest of the prolog doesn't need to be
1896 // included in the SEH unwind info.
1897 NeedsWinCFI = false;
1898 }
1899 }
1900 if (EmitAsyncCFI)
1901 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
1902 }
1903
1904 // Now emit the moves for whatever callee saved regs we have (including FP,
1905 // LR if those are saved). Frame instructions for SVE register are emitted
1906 // later, after the instruction which actually save SVE regs.
1907 if (EmitAsyncCFI)
1908 emitCalleeSavedGPRLocations(MBB, MBBI);
1909
1910 // Alignment is required for the parent frame, not the funclet
1911 const bool NeedsRealignment =
1912 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
1913 const int64_t RealignmentPadding =
1914 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
1915 ? MFI.getMaxAlign().value() - 16
1916 : 0;
1917
1918 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
1919 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
1920 if (NeedsWinCFI) {
1921 HasWinCFI = true;
1922 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
1923 // exceed this amount. We need to move at most 2^24 - 1 into x15.
1924 // This is at most two instructions, MOVZ follwed by MOVK.
1925 // TODO: Fix to use multiple stack alloc unwind codes for stacks
1926 // exceeding 256MB in size.
1927 if (NumBytes >= (1 << 28))
1928 report_fatal_error("Stack size cannot exceed 256MB for stack "
1929 "unwinding purposes");
1930
1931 uint32_t LowNumWords = NumWords & 0xFFFF;
1932 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
1933 .addImm(LowNumWords)
1936 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1938 if ((NumWords & 0xFFFF0000) != 0) {
1939 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
1940 .addReg(AArch64::X15)
1941 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
1944 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1946 }
1947 } else {
1948 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
1949 .addImm(NumWords)
1951 }
1952
1953 const char* ChkStk = Subtarget.getChkStkName();
1954 switch (MF.getTarget().getCodeModel()) {
1955 case CodeModel::Tiny:
1956 case CodeModel::Small:
1957 case CodeModel::Medium:
1958 case CodeModel::Kernel:
1959 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
1960 .addExternalSymbol(ChkStk)
1961 .addReg(AArch64::X15, RegState::Implicit)
1966 if (NeedsWinCFI) {
1967 HasWinCFI = true;
1968 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1970 }
1971 break;
1972 case CodeModel::Large:
1973 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
1974 .addReg(AArch64::X16, RegState::Define)
1975 .addExternalSymbol(ChkStk)
1976 .addExternalSymbol(ChkStk)
1978 if (NeedsWinCFI) {
1979 HasWinCFI = true;
1980 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1982 }
1983
1984 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
1985 .addReg(AArch64::X16, RegState::Kill)
1991 if (NeedsWinCFI) {
1992 HasWinCFI = true;
1993 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1995 }
1996 break;
1997 }
1998
1999 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
2000 .addReg(AArch64::SP, RegState::Kill)
2001 .addReg(AArch64::X15, RegState::Kill)
2004 if (NeedsWinCFI) {
2005 HasWinCFI = true;
2006 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
2007 .addImm(NumBytes)
2009 }
2010 NumBytes = 0;
2011
2012 if (RealignmentPadding > 0) {
2013 if (RealignmentPadding >= 4096) {
2014 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
2015 .addReg(AArch64::X16, RegState::Define)
2016 .addImm(RealignmentPadding)
2018 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
2019 .addReg(AArch64::SP)
2020 .addReg(AArch64::X16, RegState::Kill)
2023 } else {
2024 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
2025 .addReg(AArch64::SP)
2026 .addImm(RealignmentPadding)
2027 .addImm(0)
2029 }
2030
2031 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
2032 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
2033 .addReg(AArch64::X15, RegState::Kill)
2035 AFI->setStackRealigned(true);
2036
2037 // No need for SEH instructions here; if we're realigning the stack,
2038 // we've set a frame pointer and already finished the SEH prologue.
2039 assert(!NeedsWinCFI);
2040 }
2041 }
2042
2043 StackOffset SVECalleeSavesSize = {}, SVELocalsSize = SVEStackSize;
2044 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
2045
2046 // Process the SVE callee-saves to determine what space needs to be
2047 // allocated.
2048 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2049 LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize
2050 << "\n");
2051 // Find callee save instructions in frame.
2052 CalleeSavesBegin = MBBI;
2053 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
2055 ++MBBI;
2056 CalleeSavesEnd = MBBI;
2057
2058 SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
2059 SVELocalsSize = SVEStackSize - SVECalleeSavesSize;
2060 }
2061
2062 // Allocate space for the callee saves (if any).
2063 StackOffset CFAOffset =
2064 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
2065 StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
2066 allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
2067 nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
2068 MFI.hasVarSizedObjects() || LocalsSize);
2069 CFAOffset += SVECalleeSavesSize;
2070
2071 if (EmitAsyncCFI)
2072 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
2073
2074 // Allocate space for the rest of the frame including SVE locals. Align the
2075 // stack as necessary.
2076 assert(!(canUseRedZone(MF) && NeedsRealignment) &&
2077 "Cannot use redzone with stack realignment");
2078 if (!canUseRedZone(MF)) {
2079 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
2080 // the correct value here, as NumBytes also includes padding bytes,
2081 // which shouldn't be counted here.
2082 allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
2083 SVELocalsSize + StackOffset::getFixed(NumBytes),
2084 NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
2085 CFAOffset, MFI.hasVarSizedObjects());
2086 }
2087
2088 // If we need a base pointer, set it up here. It's whatever the value of the
2089 // stack pointer is at this point. Any variable size objects will be allocated
2090 // after this, so we can still use the base pointer to reference locals.
2091 //
2092 // FIXME: Clarify FrameSetup flags here.
2093 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
2094 // needed.
2095 // For funclets the BP belongs to the containing function.
2096 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
2097 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
2098 false);
2099 if (NeedsWinCFI) {
2100 HasWinCFI = true;
2101 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2103 }
2104 }
2105
2106 // The very last FrameSetup instruction indicates the end of prologue. Emit a
2107 // SEH opcode indicating the prologue end.
2108 if (NeedsWinCFI && HasWinCFI) {
2109 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2111 }
2112
2113 // SEH funclets are passed the frame pointer in X1. If the parent
2114 // function uses the base register, then the base register is used
2115 // directly, and is not retrieved from X1.
2116 if (IsFunclet && F.hasPersonalityFn()) {
2117 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
2118 if (isAsynchronousEHPersonality(Per)) {
2119 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
2120 .addReg(AArch64::X1)
2122 MBB.addLiveIn(AArch64::X1);
2123 }
2124 }
2125
2126 if (EmitCFI && !EmitAsyncCFI) {
2127 if (HasFP) {
2128 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2129 } else {
2130 StackOffset TotalSize =
2131 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
2132 unsigned CFIIndex = MF.addFrameInst(createDefCFA(
2133 *RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP, TotalSize,
2134 /*LastAdjustmentWasScalable=*/false));
2135 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2136 .addCFIIndex(CFIIndex)
2138 }
2139 emitCalleeSavedGPRLocations(MBB, MBBI);
2140 emitCalleeSavedSVELocations(MBB, MBBI);
2141 }
2142}
2143
2145 switch (MI.getOpcode()) {
2146 default:
2147 return false;
2148 case AArch64::CATCHRET:
2149 case AArch64::CLEANUPRET:
2150 return true;
2151 }
2152}
2153
2155 MachineBasicBlock &MBB) const {
2157 MachineFrameInfo &MFI = MF.getFrameInfo();
2159 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2160 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
2161 DebugLoc DL;
2162 bool NeedsWinCFI = needsWinCFI(MF);
2163 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
2164 bool HasWinCFI = false;
2165 bool IsFunclet = false;
2166
2167 if (MBB.end() != MBBI) {
2168 DL = MBBI->getDebugLoc();
2169 IsFunclet = isFuncletReturnInstr(*MBBI);
2170 }
2171
2172 MachineBasicBlock::iterator EpilogStartI = MBB.end();
2173
2174 auto FinishingTouches = make_scope_exit([&]() {
2175 if (AFI->shouldSignReturnAddress(MF)) {
2176 BuildMI(MBB, MBB.getFirstTerminator(), DL,
2177 TII->get(AArch64::PAUTH_EPILOGUE))
2178 .setMIFlag(MachineInstr::FrameDestroy);
2179 if (NeedsWinCFI)
2180 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
2181 }
2184 if (EmitCFI)
2185 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
2186 if (HasWinCFI) {
2188 TII->get(AArch64::SEH_EpilogEnd))
2190 if (!MF.hasWinCFI())
2191 MF.setHasWinCFI(true);
2192 }
2193 if (NeedsWinCFI) {
2194 assert(EpilogStartI != MBB.end());
2195 if (!HasWinCFI)
2196 MBB.erase(EpilogStartI);
2197 }
2198 });
2199
2200 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
2201 : MFI.getStackSize();
2202
2203 // All calls are tail calls in GHC calling conv, and functions have no
2204 // prologue/epilogue.
2206 return;
2207
2208 // How much of the stack used by incoming arguments this function is expected
2209 // to restore in this particular epilogue.
2210 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
2211 bool IsWin64 =
2212 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2213 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2214
2215 int64_t AfterCSRPopSize = ArgumentStackToRestore;
2216 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2217 // We cannot rely on the local stack size set in emitPrologue if the function
2218 // has funclets, as funclets have different local stack size requirements, and
2219 // the current value set in emitPrologue may be that of the containing
2220 // function.
2221 if (MF.hasEHFunclets())
2222 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2223 if (homogeneousPrologEpilog(MF, &MBB)) {
2224 assert(!NeedsWinCFI);
2225 auto LastPopI = MBB.getFirstTerminator();
2226 if (LastPopI != MBB.begin()) {
2227 auto HomogeneousEpilog = std::prev(LastPopI);
2228 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
2229 LastPopI = HomogeneousEpilog;
2230 }
2231
2232 // Adjust local stack
2233 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2235 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2236
2237 // SP has been already adjusted while restoring callee save regs.
2238 // We've bailed-out the case with adjusting SP for arguments.
2239 assert(AfterCSRPopSize == 0);
2240 return;
2241 }
2242 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
2243 // Assume we can't combine the last pop with the sp restore.
2244
2245 bool CombineAfterCSRBump = false;
2246 if (!CombineSPBump && PrologueSaveSize != 0) {
2248 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2250 Pop = std::prev(Pop);
2251 // Converting the last ldp to a post-index ldp is valid only if the last
2252 // ldp's offset is 0.
2253 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2254 // If the offset is 0 and the AfterCSR pop is not actually trying to
2255 // allocate more stack for arguments (in space that an untimely interrupt
2256 // may clobber), convert it to a post-index ldp.
2257 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2259 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2260 MachineInstr::FrameDestroy, PrologueSaveSize);
2261 } else {
2262 // If not, make sure to emit an add after the last ldp.
2263 // We're doing this by transfering the size to be restored from the
2264 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2265 // pops.
2266 AfterCSRPopSize += PrologueSaveSize;
2267 CombineAfterCSRBump = true;
2268 }
2269 }
2270
2271 // Move past the restores of the callee-saved registers.
2272 // If we plan on combining the sp bump of the local stack size and the callee
2273 // save stack size, we might need to adjust the CSR save and restore offsets.
2276 while (LastPopI != Begin) {
2277 --LastPopI;
2278 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2279 IsSVECalleeSave(LastPopI)) {
2280 ++LastPopI;
2281 break;
2282 } else if (CombineSPBump)
2284 NeedsWinCFI, &HasWinCFI);
2285 }
2286
2287 if (NeedsWinCFI) {
2288 // Note that there are cases where we insert SEH opcodes in the
2289 // epilogue when we had no SEH opcodes in the prologue. For
2290 // example, when there is no stack frame but there are stack
2291 // arguments. Insert the SEH_EpilogStart and remove it later if it
2292 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2293 // functions that don't need it.
2294 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2296 EpilogStartI = LastPopI;
2297 --EpilogStartI;
2298 }
2299
2300 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2303 // Avoid the reload as it is GOT relative, and instead fall back to the
2304 // hardcoded value below. This allows a mismatch between the OS and
2305 // application without immediately terminating on the difference.
2306 [[fallthrough]];
2308 // We need to reset FP to its untagged state on return. Bit 60 is
2309 // currently used to show the presence of an extended frame.
2310
2311 // BIC x29, x29, #0x1000_0000_0000_0000
2312 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2313 AArch64::FP)
2314 .addUse(AArch64::FP)
2315 .addImm(0x10fe)
2317 if (NeedsWinCFI) {
2318 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2320 HasWinCFI = true;
2321 }
2322 break;
2323
2325 break;
2326 }
2327 }
2328
2329 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2330
2331 // If there is a single SP update, insert it before the ret and we're done.
2332 if (CombineSPBump) {
2333 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2334
2335 // When we are about to restore the CSRs, the CFA register is SP again.
2336 if (EmitCFI && hasFP(MF)) {
2337 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2338 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2339 unsigned CFIIndex =
2340 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2341 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2342 .addCFIIndex(CFIIndex)
2344 }
2345
2346 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2347 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2348 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2349 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2350 return;
2351 }
2352
2353 NumBytes -= PrologueSaveSize;
2354 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2355
2356 // Process the SVE callee-saves to determine what space needs to be
2357 // deallocated.
2358 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2359 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2360 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2361 RestoreBegin = std::prev(RestoreEnd);
2362 while (RestoreBegin != MBB.begin() &&
2363 IsSVECalleeSave(std::prev(RestoreBegin)))
2364 --RestoreBegin;
2365
2366 assert(IsSVECalleeSave(RestoreBegin) &&
2367 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2368
2369 StackOffset CalleeSavedSizeAsOffset =
2370 StackOffset::getScalable(CalleeSavedSize);
2371 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2372 DeallocateAfter = CalleeSavedSizeAsOffset;
2373 }
2374
2375 // Deallocate the SVE area.
2376 if (SVEStackSize) {
2377 // If we have stack realignment or variable sized objects on the stack,
2378 // restore the stack pointer from the frame pointer prior to SVE CSR
2379 // restoration.
2380 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2381 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2382 // Set SP to start of SVE callee-save area from which they can
2383 // be reloaded. The code below will deallocate the stack space
2384 // space by moving FP -> SP.
2385 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2386 StackOffset::getScalable(-CalleeSavedSize), TII,
2388 }
2389 } else {
2390 if (AFI->getSVECalleeSavedStackSize()) {
2391 // Deallocate the non-SVE locals first before we can deallocate (and
2392 // restore callee saves) from the SVE area.
2394 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2396 false, false, nullptr, EmitCFI && !hasFP(MF),
2397 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2398 NumBytes = 0;
2399 }
2400
2401 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2402 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2403 false, nullptr, EmitCFI && !hasFP(MF),
2404 SVEStackSize +
2405 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2406
2407 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2408 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2409 false, nullptr, EmitCFI && !hasFP(MF),
2410 DeallocateAfter +
2411 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2412 }
2413 if (EmitCFI)
2414 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2415 }
2416
2417 if (!hasFP(MF)) {
2418 bool RedZone = canUseRedZone(MF);
2419 // If this was a redzone leaf function, we don't need to restore the
2420 // stack pointer (but we may need to pop stack args for fastcc).
2421 if (RedZone && AfterCSRPopSize == 0)
2422 return;
2423
2424 // Pop the local variables off the stack. If there are no callee-saved
2425 // registers, it means we are actually positioned at the terminator and can
2426 // combine stack increment for the locals and the stack increment for
2427 // callee-popped arguments into (possibly) a single instruction and be done.
2428 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2429 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2430 if (NoCalleeSaveRestore)
2431 StackRestoreBytes += AfterCSRPopSize;
2432
2434 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2435 StackOffset::getFixed(StackRestoreBytes), TII,
2436 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2437 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2438
2439 // If we were able to combine the local stack pop with the argument pop,
2440 // then we're done.
2441 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2442 return;
2443 }
2444
2445 NumBytes = 0;
2446 }
2447
2448 // Restore the original stack pointer.
2449 // FIXME: Rather than doing the math here, we should instead just use
2450 // non-post-indexed loads for the restores if we aren't actually going to
2451 // be able to save any instructions.
2452 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2454 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2456 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2457 } else if (NumBytes)
2458 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2459 StackOffset::getFixed(NumBytes), TII,
2460 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2461
2462 // When we are about to restore the CSRs, the CFA register is SP again.
2463 if (EmitCFI && hasFP(MF)) {
2464 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2465 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2466 unsigned CFIIndex = MF.addFrameInst(
2467 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2468 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2469 .addCFIIndex(CFIIndex)
2471 }
2472
2473 // This must be placed after the callee-save restore code because that code
2474 // assumes the SP is at the same location as it was after the callee-save save
2475 // code in the prologue.
2476 if (AfterCSRPopSize) {
2477 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2478 "interrupt may have clobbered");
2479
2481 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2483 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2484 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2485 }
2486}
2487
2490 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2491}
2492
2493/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2494/// debug info. It's the same as what we use for resolving the code-gen
2495/// references for now. FIXME: This can go wrong when references are
2496/// SP-relative and simple call frames aren't used.
2499 Register &FrameReg) const {
2501 MF, FI, FrameReg,
2502 /*PreferFP=*/
2503 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress),
2504 /*ForSimm=*/false);
2505}
2506
2509 int FI) const {
2511}
2512
2514 int64_t ObjectOffset) {
2515 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2516 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2517 bool IsWin64 =
2518 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2519 unsigned FixedObject =
2520 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2521 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2522 int64_t FPAdjust =
2523 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2524 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2525}
2526
2528 int64_t ObjectOffset) {
2529 const auto &MFI = MF.getFrameInfo();
2530 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2531}
2532
2533 // TODO: This function currently does not work for scalable vectors.
2535 int FI) const {
2536 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2538 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2539 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2540 ? getFPOffset(MF, ObjectOffset).getFixed()
2541 : getStackOffset(MF, ObjectOffset).getFixed();
2542}
2543
2545 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2546 bool ForSimm) const {
2547 const auto &MFI = MF.getFrameInfo();
2548 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2549 bool isFixed = MFI.isFixedObjectIndex(FI);
2550 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2551 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2552 PreferFP, ForSimm);
2553}
2554
2556 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2557 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2558 const auto &MFI = MF.getFrameInfo();
2559 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2561 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2562 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2563
2564 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2565 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2566 bool isCSR =
2567 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2568
2569 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2570
2571 // Use frame pointer to reference fixed objects. Use it for locals if
2572 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2573 // reliable as a base). Make sure useFPForScavengingIndex() does the
2574 // right thing for the emergency spill slot.
2575 bool UseFP = false;
2576 if (AFI->hasStackFrame() && !isSVE) {
2577 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2578 // there are scalable (SVE) objects in between the FP and the fixed-sized
2579 // objects.
2580 PreferFP &= !SVEStackSize;
2581
2582 // Note: Keeping the following as multiple 'if' statements rather than
2583 // merging to a single expression for readability.
2584 //
2585 // Argument access should always use the FP.
2586 if (isFixed) {
2587 UseFP = hasFP(MF);
2588 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2589 // References to the CSR area must use FP if we're re-aligning the stack
2590 // since the dynamically-sized alignment padding is between the SP/BP and
2591 // the CSR area.
2592 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2593 UseFP = true;
2594 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2595 // If the FPOffset is negative and we're producing a signed immediate, we
2596 // have to keep in mind that the available offset range for negative
2597 // offsets is smaller than for positive ones. If an offset is available
2598 // via the FP and the SP, use whichever is closest.
2599 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2600 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2601
2602 if (MFI.hasVarSizedObjects()) {
2603 // If we have variable sized objects, we can use either FP or BP, as the
2604 // SP offset is unknown. We can use the base pointer if we have one and
2605 // FP is not preferred. If not, we're stuck with using FP.
2606 bool CanUseBP = RegInfo->hasBasePointer(MF);
2607 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2608 UseFP = PreferFP;
2609 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2610 UseFP = true;
2611 // else we can use BP and FP, but the offset from FP won't fit.
2612 // That will make us scavenge registers which we can probably avoid by
2613 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2614 } else if (FPOffset >= 0) {
2615 // Use SP or FP, whichever gives us the best chance of the offset
2616 // being in range for direct access. If the FPOffset is positive,
2617 // that'll always be best, as the SP will be even further away.
2618 UseFP = true;
2619 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2620 // Funclets access the locals contained in the parent's stack frame
2621 // via the frame pointer, so we have to use the FP in the parent
2622 // function.
2623 (void) Subtarget;
2624 assert(
2625 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv()) &&
2626 "Funclets should only be present on Win64");
2627 UseFP = true;
2628 } else {
2629 // We have the choice between FP and (SP or BP).
2630 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2631 UseFP = true;
2632 }
2633 }
2634 }
2635
2636 assert(
2637 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2638 "In the presence of dynamic stack pointer realignment, "
2639 "non-argument/CSR objects cannot be accessed through the frame pointer");
2640
2641 if (isSVE) {
2642 StackOffset FPOffset =
2644 StackOffset SPOffset =
2645 SVEStackSize +
2646 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2647 ObjectOffset);
2648 // Always use the FP for SVE spills if available and beneficial.
2649 if (hasFP(MF) && (SPOffset.getFixed() ||
2650 FPOffset.getScalable() < SPOffset.getScalable() ||
2651 RegInfo->hasStackRealignment(MF))) {
2652 FrameReg = RegInfo->getFrameRegister(MF);
2653 return FPOffset;
2654 }
2655
2656 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2657 : (unsigned)AArch64::SP;
2658 return SPOffset;
2659 }
2660
2661 StackOffset ScalableOffset = {};
2662 if (UseFP && !(isFixed || isCSR))
2663 ScalableOffset = -SVEStackSize;
2664 if (!UseFP && (isFixed || isCSR))
2665 ScalableOffset = SVEStackSize;
2666
2667 if (UseFP) {
2668 FrameReg = RegInfo->getFrameRegister(MF);
2669 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2670 }
2671
2672 // Use the base pointer if we have one.
2673 if (RegInfo->hasBasePointer(MF))
2674 FrameReg = RegInfo->getBaseRegister();
2675 else {
2676 assert(!MFI.hasVarSizedObjects() &&
2677 "Can't use SP when we have var sized objects.");
2678 FrameReg = AArch64::SP;
2679 // If we're using the red zone for this function, the SP won't actually
2680 // be adjusted, so the offsets will be negative. They're also all
2681 // within range of the signed 9-bit immediate instructions.
2682 if (canUseRedZone(MF))
2683 Offset -= AFI->getLocalStackSize();
2684 }
2685
2686 return StackOffset::getFixed(Offset) + ScalableOffset;
2687}
2688
2689static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2690 // Do not set a kill flag on values that are also marked as live-in. This
2691 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2692 // callee saved registers.
2693 // Omitting the kill flags is conservatively correct even if the live-in
2694 // is not used after all.
2695 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2696 return getKillRegState(!IsLiveIn);
2697}
2698
2700 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2702 return Subtarget.isTargetMachO() &&
2703 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2704 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2706}
2707
2708static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2709 bool NeedsWinCFI, bool IsFirst,
2710 const TargetRegisterInfo *TRI) {
2711 // If we are generating register pairs for a Windows function that requires
2712 // EH support, then pair consecutive registers only. There are no unwind
2713 // opcodes for saves/restores of non-consectuve register pairs.
2714 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2715 // save_lrpair.
2716 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2717
2718 if (Reg2 == AArch64::FP)
2719 return true;
2720 if (!NeedsWinCFI)
2721 return false;
2722 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2723 return false;
2724 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2725 // opcode. If this is the first register pair, it would end up with a
2726 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2727 // if LR is paired with something else than the first register.
2728 // The save_lrpair opcode requires the first register to be an odd one.
2729 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2730 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2731 return false;
2732 return true;
2733}
2734
2735/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2736/// WindowsCFI requires that only consecutive registers can be paired.
2737/// LR and FP need to be allocated together when the frame needs to save
2738/// the frame-record. This means any other register pairing with LR is invalid.
2739static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2740 bool UsesWinAAPCS, bool NeedsWinCFI,
2741 bool NeedsFrameRecord, bool IsFirst,
2742 const TargetRegisterInfo *TRI) {
2743 if (UsesWinAAPCS)
2744 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2745 TRI);
2746
2747 // If we need to store the frame record, don't pair any register
2748 // with LR other than FP.
2749 if (NeedsFrameRecord)
2750 return Reg2 == AArch64::LR;
2751
2752 return false;
2753}
2754
2755namespace {
2756
2757struct RegPairInfo {
2758 unsigned Reg1 = AArch64::NoRegister;
2759 unsigned Reg2 = AArch64::NoRegister;
2760 int FrameIdx;
2761 int Offset;
2762 enum RegType { GPR, FPR64, FPR128, PPR, ZPR } Type;
2763
2764 RegPairInfo() = default;
2765
2766 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2767
2768 unsigned getScale() const {
2769 switch (Type) {
2770 case PPR:
2771 return 2;
2772 case GPR:
2773 case FPR64:
2774 return 8;
2775 case ZPR:
2776 case FPR128:
2777 return 16;
2778 }
2779 llvm_unreachable("Unsupported type");
2780 }
2781
2782 bool isScalable() const { return Type == PPR || Type == ZPR; }
2783};
2784
2785} // end anonymous namespace
2786
2787unsigned findFreePredicateReg(BitVector &SavedRegs) {
2788 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
2789 if (SavedRegs.test(PReg)) {
2790 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
2791 return PNReg;
2792 }
2793 }
2794 return AArch64::NoRegister;
2795}
2796
2800 bool NeedsFrameRecord) {
2801
2802 if (CSI.empty())
2803 return;
2804
2805 bool IsWindows = isTargetWindows(MF);
2806 bool NeedsWinCFI = needsWinCFI(MF);
2808 MachineFrameInfo &MFI = MF.getFrameInfo();
2810 unsigned Count = CSI.size();
2811 (void)CC;
2812 // MachO's compact unwind format relies on all registers being stored in
2813 // pairs.
2816 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2817 "Odd number of callee-saved regs to spill!");
2818 int ByteOffset = AFI->getCalleeSavedStackSize();
2819 int StackFillDir = -1;
2820 int RegInc = 1;
2821 unsigned FirstReg = 0;
2822 if (NeedsWinCFI) {
2823 // For WinCFI, fill the stack from the bottom up.
2824 ByteOffset = 0;
2825 StackFillDir = 1;
2826 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2827 // backwards, to pair up registers starting from lower numbered registers.
2828 RegInc = -1;
2829 FirstReg = Count - 1;
2830 }
2831 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
2832 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2833
2834 // When iterating backwards, the loop condition relies on unsigned wraparound.
2835 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2836 RegPairInfo RPI;
2837 RPI.Reg1 = CSI[i].getReg();
2838
2839 if (AArch64::GPR64RegClass.contains(RPI.Reg1))
2840 RPI.Type = RegPairInfo::GPR;
2841 else if (AArch64::FPR64RegClass.contains(RPI.Reg1))
2842 RPI.Type = RegPairInfo::FPR64;
2843 else if (AArch64::FPR128RegClass.contains(RPI.Reg1))
2844 RPI.Type = RegPairInfo::FPR128;
2845 else if (AArch64::ZPRRegClass.contains(RPI.Reg1))
2846 RPI.Type = RegPairInfo::ZPR;
2847 else if (AArch64::PPRRegClass.contains(RPI.Reg1))
2848 RPI.Type = RegPairInfo::PPR;
2849 else
2850 llvm_unreachable("Unsupported register class.");
2851
2852 // Add the next reg to the pair if it is in the same register class.
2853 if (unsigned(i + RegInc) < Count) {
2854 Register NextReg = CSI[i + RegInc].getReg();
2855 bool IsFirst = i == FirstReg;
2856 switch (RPI.Type) {
2857 case RegPairInfo::GPR:
2858 if (AArch64::GPR64RegClass.contains(NextReg) &&
2859 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
2860 NeedsWinCFI, NeedsFrameRecord, IsFirst,
2861 TRI))
2862 RPI.Reg2 = NextReg;
2863 break;
2864 case RegPairInfo::FPR64:
2865 if (AArch64::FPR64RegClass.contains(NextReg) &&
2866 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
2867 IsFirst, TRI))
2868 RPI.Reg2 = NextReg;
2869 break;
2870 case RegPairInfo::FPR128:
2871 if (AArch64::FPR128RegClass.contains(NextReg))
2872 RPI.Reg2 = NextReg;
2873 break;
2874 case RegPairInfo::PPR:
2875 break;
2876 case RegPairInfo::ZPR:
2877 if (AFI->getPredicateRegForFillSpill() != 0)
2878 if (((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1))
2879 RPI.Reg2 = NextReg;
2880 break;
2881 }
2882 }
2883
2884 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
2885 // list to come in sorted by frame index so that we can issue the store
2886 // pair instructions directly. Assert if we see anything otherwise.
2887 //
2888 // The order of the registers in the list is controlled by
2889 // getCalleeSavedRegs(), so they will always be in-order, as well.
2890 assert((!RPI.isPaired() ||
2891 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
2892 "Out of order callee saved regs!");
2893
2894 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
2895 RPI.Reg1 == AArch64::LR) &&
2896 "FrameRecord must be allocated together with LR");
2897
2898 // Windows AAPCS has FP and LR reversed.
2899 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
2900 RPI.Reg2 == AArch64::LR) &&
2901 "FrameRecord must be allocated together with LR");
2902
2903 // MachO's compact unwind format relies on all registers being stored in
2904 // adjacent register pairs.
2908 (RPI.isPaired() &&
2909 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
2910 RPI.Reg1 + 1 == RPI.Reg2))) &&
2911 "Callee-save registers not saved as adjacent register pair!");
2912
2913 RPI.FrameIdx = CSI[i].getFrameIdx();
2914 if (NeedsWinCFI &&
2915 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
2916 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
2917 int Scale = RPI.getScale();
2918
2919 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2920 assert(OffsetPre % Scale == 0);
2921
2922 if (RPI.isScalable())
2923 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
2924 else
2925 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
2926
2927 // Swift's async context is directly before FP, so allocate an extra
2928 // 8 bytes for it.
2929 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2930 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
2931 (IsWindows && RPI.Reg2 == AArch64::LR)))
2932 ByteOffset += StackFillDir * 8;
2933
2934 // Round up size of non-pair to pair size if we need to pad the
2935 // callee-save area to ensure 16-byte alignment.
2936 if (NeedGapToAlignStack && !NeedsWinCFI &&
2937 !RPI.isScalable() && RPI.Type != RegPairInfo::FPR128 &&
2938 !RPI.isPaired() && ByteOffset % 16 != 0) {
2939 ByteOffset += 8 * StackFillDir;
2940 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
2941 // A stack frame with a gap looks like this, bottom up:
2942 // d9, d8. x21, gap, x20, x19.
2943 // Set extra alignment on the x21 object to create the gap above it.
2944 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
2945 NeedGapToAlignStack = false;
2946 }
2947
2948 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2949 assert(OffsetPost % Scale == 0);
2950 // If filling top down (default), we want the offset after incrementing it.
2951 // If filling bottom up (WinCFI) we need the original offset.
2952 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
2953
2954 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
2955 // Swift context can directly precede FP.
2956 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2957 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
2958 (IsWindows && RPI.Reg2 == AArch64::LR)))
2959 Offset += 8;
2960 RPI.Offset = Offset / Scale;
2961
2962 assert(((!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
2963 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
2964 "Offset out of bounds for LDP/STP immediate");
2965
2966 // Save the offset to frame record so that the FP register can point to the
2967 // innermost frame record (spilled FP and LR registers).
2968 if (NeedsFrameRecord && ((!IsWindows && RPI.Reg1 == AArch64::LR &&
2969 RPI.Reg2 == AArch64::FP) ||
2970 (IsWindows && RPI.Reg1 == AArch64::FP &&
2971 RPI.Reg2 == AArch64::LR)))
2973
2974 RegPairs.push_back(RPI);
2975 if (RPI.isPaired())
2976 i += RegInc;
2977 }
2978 if (NeedsWinCFI) {
2979 // If we need an alignment gap in the stack, align the topmost stack
2980 // object. A stack frame with a gap looks like this, bottom up:
2981 // x19, d8. d9, gap.
2982 // Set extra alignment on the topmost stack object (the first element in
2983 // CSI, which goes top down), to create the gap above it.
2984 if (AFI->hasCalleeSaveStackFreeSpace())
2985 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
2986 // We iterated bottom up over the registers; flip RegPairs back to top
2987 // down order.
2988 std::reverse(RegPairs.begin(), RegPairs.end());
2989 }
2990}
2991
2995 MachineFunction &MF = *MBB.getParent();
2997 bool NeedsWinCFI = needsWinCFI(MF);
2998 DebugLoc DL;
3000
3001 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3002
3003 const MachineRegisterInfo &MRI = MF.getRegInfo();
3004 if (homogeneousPrologEpilog(MF)) {
3005 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
3007
3008 for (auto &RPI : RegPairs) {
3009 MIB.addReg(RPI.Reg1);
3010 MIB.addReg(RPI.Reg2);
3011
3012 // Update register live in.
3013 if (!MRI.isReserved(RPI.Reg1))
3014 MBB.addLiveIn(RPI.Reg1);
3015 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
3016 MBB.addLiveIn(RPI.Reg2);
3017 }
3018 return true;
3019 }
3020 bool PTrueCreated = false;
3021 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
3022 unsigned Reg1 = RPI.Reg1;
3023 unsigned Reg2 = RPI.Reg2;
3024 unsigned StrOpc;
3025
3026 // Issue sequence of spills for cs regs. The first spill may be converted
3027 // to a pre-decrement store later by emitPrologue if the callee-save stack
3028 // area allocation can't be combined with the local stack area allocation.
3029 // For example:
3030 // stp x22, x21, [sp, #0] // addImm(+0)
3031 // stp x20, x19, [sp, #16] // addImm(+2)
3032 // stp fp, lr, [sp, #32] // addImm(+4)
3033 // Rationale: This sequence saves uop updates compared to a sequence of
3034 // pre-increment spills like stp xi,xj,[sp,#-16]!
3035 // Note: Similar rationale and sequence for restores in epilog.
3036 unsigned Size;
3037 Align Alignment;
3038 switch (RPI.Type) {
3039 case RegPairInfo::GPR:
3040 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
3041 Size = 8;
3042 Alignment = Align(8);
3043 break;
3044 case RegPairInfo::FPR64:
3045 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
3046 Size = 8;
3047 Alignment = Align(8);
3048 break;
3049 case RegPairInfo::FPR128:
3050 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
3051 Size = 16;
3052 Alignment = Align(16);
3053 break;
3054 case RegPairInfo::ZPR:
3055 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
3056 Size = 16;
3057 Alignment = Align(16);
3058 break;
3059 case RegPairInfo::PPR:
3060 StrOpc = AArch64::STR_PXI;
3061 Size = 2;
3062 Alignment = Align(2);
3063 break;
3064 }
3065 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
3066 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3067 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3068 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3069 dbgs() << ")\n");
3070
3071 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
3072 "Windows unwdinding requires a consecutive (FP,LR) pair");
3073 // Windows unwind codes require consecutive registers if registers are
3074 // paired. Make the switch here, so that the code below will save (x,x+1)
3075 // and not (x+1,x).
3076 unsigned FrameIdxReg1 = RPI.FrameIdx;
3077 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3078 if (NeedsWinCFI && RPI.isPaired()) {
3079 std::swap(Reg1, Reg2);
3080 std::swap(FrameIdxReg1, FrameIdxReg2);
3081 }
3082
3083 if (RPI.isPaired() && RPI.isScalable()) {
3084 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3087 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3088 assert(((Subtarget.hasSVE2p1() || Subtarget.hasSME2()) && PnReg != 0) &&
3089 "Expects SVE2.1 or SME2 target and a predicate register");
3090#ifdef EXPENSIVE_CHECKS
3091 auto IsPPR = [](const RegPairInfo &c) {
3092 return c.Reg1 == RegPairInfo::PPR;
3093 };
3094 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3095 auto IsZPR = [](const RegPairInfo &c) {
3096 return c.Type == RegPairInfo::ZPR;
3097 };
3098 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3099 assert(!(PPRBegin < ZPRBegin) &&
3100 "Expected callee save predicate to be handled first");
3101#endif
3102 if (!PTrueCreated) {
3103 PTrueCreated = true;
3104 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3106 }
3107 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3108 if (!MRI.isReserved(Reg1))
3109 MBB.addLiveIn(Reg1);
3110 if (!MRI.isReserved(Reg2))
3111 MBB.addLiveIn(Reg2);
3112 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
3114 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3115 MachineMemOperand::MOStore, Size, Alignment));
3116 MIB.addReg(PnReg);
3117 MIB.addReg(AArch64::SP)
3118 .addImm(RPI.Offset) // [sp, #offset*scale],
3119 // where factor*scale is implicit
3122 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3123 MachineMemOperand::MOStore, Size, Alignment));
3124 if (NeedsWinCFI)
3126 } else { // The code when the pair of ZReg is not present
3127 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3128 if (!MRI.isReserved(Reg1))
3129 MBB.addLiveIn(Reg1);
3130 if (RPI.isPaired()) {
3131 if (!MRI.isReserved(Reg2))
3132 MBB.addLiveIn(Reg2);
3133 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
3135 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3136 MachineMemOperand::MOStore, Size, Alignment));
3137 }
3138 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
3139 .addReg(AArch64::SP)
3140 .addImm(RPI.Offset) // [sp, #offset*scale],
3141 // where factor*scale is implicit
3144 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3145 MachineMemOperand::MOStore, Size, Alignment));
3146 if (NeedsWinCFI)
3148 }
3149 // Update the StackIDs of the SVE stack slots.
3150 MachineFrameInfo &MFI = MF.getFrameInfo();
3151 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
3152 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
3153 if (RPI.isPaired())
3154 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
3155 }
3156 }
3157 return true;
3158}
3159
3163 MachineFunction &MF = *MBB.getParent();
3165 DebugLoc DL;
3167 bool NeedsWinCFI = needsWinCFI(MF);
3168
3169 if (MBBI != MBB.end())
3170 DL = MBBI->getDebugLoc();
3171
3172 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3173 if (homogeneousPrologEpilog(MF, &MBB)) {
3174 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
3176 for (auto &RPI : RegPairs) {
3177 MIB.addReg(RPI.Reg1, RegState::Define);
3178 MIB.addReg(RPI.Reg2, RegState::Define);
3179 }
3180 return true;
3181 }
3182
3183 // For performance reasons restore SVE register in increasing order
3184 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
3185 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3186 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
3187 std::reverse(PPRBegin, PPREnd);
3188 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
3189 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3190 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
3191 std::reverse(ZPRBegin, ZPREnd);
3192
3193 bool PTrueCreated = false;
3194 for (const RegPairInfo &RPI : RegPairs) {
3195 unsigned Reg1 = RPI.Reg1;
3196 unsigned Reg2 = RPI.Reg2;
3197
3198 // Issue sequence of restores for cs regs. The last restore may be converted
3199 // to a post-increment load later by emitEpilogue if the callee-save stack
3200 // area allocation can't be combined with the local stack area allocation.
3201 // For example:
3202 // ldp fp, lr, [sp, #32] // addImm(+4)
3203 // ldp x20, x19, [sp, #16] // addImm(+2)
3204 // ldp x22, x21, [sp, #0] // addImm(+0)
3205 // Note: see comment in spillCalleeSavedRegisters()
3206 unsigned LdrOpc;
3207 unsigned Size;
3208 Align Alignment;
3209 switch (RPI.Type) {
3210 case RegPairInfo::GPR:
3211 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
3212 Size = 8;
3213 Alignment = Align(8);
3214 break;
3215 case RegPairInfo::FPR64:
3216 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
3217 Size = 8;
3218 Alignment = Align(8);
3219 break;
3220 case RegPairInfo::FPR128:
3221 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
3222 Size = 16;
3223 Alignment = Align(16);
3224 break;
3225 case RegPairInfo::ZPR:
3226 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
3227 Size = 16;
3228 Alignment = Align(16);
3229 break;
3230 case RegPairInfo::PPR:
3231 LdrOpc = AArch64::LDR_PXI;
3232 Size = 2;
3233 Alignment = Align(2);
3234 break;
3235 }
3236 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
3237 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3238 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3239 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3240 dbgs() << ")\n");
3241
3242 // Windows unwind codes require consecutive registers if registers are
3243 // paired. Make the switch here, so that the code below will save (x,x+1)
3244 // and not (x+1,x).
3245 unsigned FrameIdxReg1 = RPI.FrameIdx;
3246 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3247 if (NeedsWinCFI && RPI.isPaired()) {
3248 std::swap(Reg1, Reg2);
3249 std::swap(FrameIdxReg1, FrameIdxReg2);
3250 }
3251
3253 if (RPI.isPaired() && RPI.isScalable()) {
3254 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3256 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3257 assert(((Subtarget.hasSVE2p1() || Subtarget.hasSME2()) && PnReg != 0) &&
3258 "Expects SVE2.1 or SME2 target and a predicate register");
3259#ifdef EXPENSIVE_CHECKS
3260 assert(!(PPRBegin < ZPRBegin) &&
3261 "Expected callee save predicate to be handled first");
3262#endif
3263 if (!PTrueCreated) {
3264 PTrueCreated = true;
3265 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3267 }
3268 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3269 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
3270 getDefRegState(true));
3272 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3273 MachineMemOperand::MOLoad, Size, Alignment));
3274 MIB.addReg(PnReg);
3275 MIB.addReg(AArch64::SP)
3276 .addImm(RPI.Offset) // [sp, #offset*scale]
3277 // where factor*scale is implicit
3280 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3281 MachineMemOperand::MOLoad, Size, Alignment));
3282 if (NeedsWinCFI)
3284 } else {
3285 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3286 if (RPI.isPaired()) {
3287 MIB.addReg(Reg2, getDefRegState(true));
3289 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3290 MachineMemOperand::MOLoad, Size, Alignment));
3291 }
3292 MIB.addReg(Reg1, getDefRegState(true));
3293 MIB.addReg(AArch64::SP)
3294 .addImm(RPI.Offset) // [sp, #offset*scale]
3295 // where factor*scale is implicit
3298 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3299 MachineMemOperand::MOLoad, Size, Alignment));
3300 if (NeedsWinCFI)
3302 }
3303 }
3304 return true;
3305}
3306
3308 BitVector &SavedRegs,
3309 RegScavenger *RS) const {
3310 // All calls are tail calls in GHC calling conv, and functions have no
3311 // prologue/epilogue.
3313 return;
3314
3316 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3318 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3320 unsigned UnspilledCSGPR = AArch64::NoRegister;
3321 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3322
3323 MachineFrameInfo &MFI = MF.getFrameInfo();
3324 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3325
3326 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3327 ? RegInfo->getBaseRegister()
3328 : (unsigned)AArch64::NoRegister;
3329
3330 unsigned ExtraCSSpill = 0;
3331 bool HasUnpairedGPR64 = false;
3332 bool HasPairZReg = false;
3333 // Figure out which callee-saved registers to save/restore.
3334 for (unsigned i = 0; CSRegs[i]; ++i) {
3335 const unsigned Reg = CSRegs[i];
3336
3337 // Add the base pointer register to SavedRegs if it is callee-save.
3338 if (Reg == BasePointerReg)
3339 SavedRegs.set(Reg);
3340
3341 bool RegUsed = SavedRegs.test(Reg);
3342 unsigned PairedReg = AArch64::NoRegister;
3343 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3344 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3345 AArch64::FPR128RegClass.contains(Reg)) {
3346 // Compensate for odd numbers of GP CSRs.
3347 // For now, all the known cases of odd number of CSRs are of GPRs.
3348 if (HasUnpairedGPR64)
3349 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3350 else
3351 PairedReg = CSRegs[i ^ 1];
3352 }
3353
3354 // If the function requires all the GP registers to save (SavedRegs),
3355 // and there are an odd number of GP CSRs at the same time (CSRegs),
3356 // PairedReg could be in a different register class from Reg, which would
3357 // lead to a FPR (usually D8) accidentally being marked saved.
3358 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3359 PairedReg = AArch64::NoRegister;
3360 HasUnpairedGPR64 = true;
3361 }
3362 assert(PairedReg == AArch64::NoRegister ||
3363 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3364 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3365 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3366
3367 if (!RegUsed) {
3368 if (AArch64::GPR64RegClass.contains(Reg) &&
3369 !RegInfo->isReservedReg(MF, Reg)) {
3370 UnspilledCSGPR = Reg;
3371 UnspilledCSGPRPaired = PairedReg;
3372 }
3373 continue;
3374 }
3375
3376 // MachO's compact unwind format relies on all registers being stored in
3377 // pairs.
3378 // FIXME: the usual format is actually better if unwinding isn't needed.
3379 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3380 !SavedRegs.test(PairedReg)) {
3381 SavedRegs.set(PairedReg);
3382 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3383 !RegInfo->isReservedReg(MF, PairedReg))
3384 ExtraCSSpill = PairedReg;
3385 }
3386 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
3387 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
3388 SavedRegs.test(CSRegs[i ^ 1]));
3389 }
3390
3391 if (HasPairZReg && (Subtarget.hasSVE2p1() || Subtarget.hasSME2())) {
3393 // Find a suitable predicate register for the multi-vector spill/fill
3394 // instructions.
3395 unsigned PnReg = findFreePredicateReg(SavedRegs);
3396 if (PnReg != AArch64::NoRegister)
3397 AFI->setPredicateRegForFillSpill(PnReg);
3398 // If no free callee-save has been found assign one.
3399 if (!AFI->getPredicateRegForFillSpill() &&
3400 MF.getFunction().getCallingConv() ==
3402 SavedRegs.set(AArch64::P8);
3403 AFI->setPredicateRegForFillSpill(AArch64::PN8);
3404 }
3405
3406 assert(!RegInfo->isReservedReg(MF, AFI->getPredicateRegForFillSpill()) &&
3407 "Predicate cannot be a reserved register");
3408 }
3409
3411 !Subtarget.isTargetWindows()) {
3412 // For Windows calling convention on a non-windows OS, where X18 is treated
3413 // as reserved, back up X18 when entering non-windows code (marked with the
3414 // Windows calling convention) and restore when returning regardless of
3415 // whether the individual function uses it - it might call other functions
3416 // that clobber it.
3417 SavedRegs.set(AArch64::X18);
3418 }
3419
3420 // Calculates the callee saved stack size.
3421 unsigned CSStackSize = 0;
3422 unsigned SVECSStackSize = 0;
3424 const MachineRegisterInfo &MRI = MF.getRegInfo();
3425 for (unsigned Reg : SavedRegs.set_bits()) {
3426 auto RegSize = TRI->getRegSizeInBits(Reg, MRI) / 8;
3427 if (AArch64::PPRRegClass.contains(Reg) ||
3428 AArch64::ZPRRegClass.contains(Reg))
3429 SVECSStackSize += RegSize;
3430 else
3431 CSStackSize += RegSize;
3432 }
3433
3434 // Save number of saved regs, so we can easily update CSStackSize later.
3435 unsigned NumSavedRegs = SavedRegs.count();
3436
3437 // The frame record needs to be created by saving the appropriate registers
3438 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3439 if (hasFP(MF) ||
3440 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3441 SavedRegs.set(AArch64::FP);
3442 SavedRegs.set(AArch64::LR);
3443 }
3444
3445 LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3446 for (unsigned Reg
3447 : SavedRegs.set_bits()) dbgs()
3448 << ' ' << printReg(Reg, RegInfo);
3449 dbgs() << "\n";);
3450
3451 // If any callee-saved registers are used, the frame cannot be eliminated.
3452 int64_t SVEStackSize =
3453 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3454 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3455
3456 // The CSR spill slots have not been allocated yet, so estimateStackSize
3457 // won't include them.
3458 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3459
3460 // We may address some of the stack above the canonical frame address, either
3461 // for our own arguments or during a call. Include that in calculating whether
3462 // we have complicated addressing concerns.
3463 int64_t CalleeStackUsed = 0;
3464 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3465 int64_t FixedOff = MFI.getObjectOffset(I);
3466 if (FixedOff > CalleeStackUsed) CalleeStackUsed = FixedOff;
3467 }
3468
3469 // Conservatively always assume BigStack when there are SVE spills.
3470 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3471 CalleeStackUsed) > EstimatedStackSizeLimit;
3472 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3473 AFI->setHasStackFrame(true);
3474
3475 // Estimate if we might need to scavenge a register at some point in order
3476 // to materialize a stack offset. If so, either spill one additional
3477 // callee-saved register or reserve a special spill slot to facilitate
3478 // register scavenging. If we already spilled an extra callee-saved register
3479 // above to keep the number of spills even, we don't need to do anything else
3480 // here.
3481 if (BigStack) {
3482 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3483 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3484 << " to get a scratch register.\n");
3485 SavedRegs.set(UnspilledCSGPR);
3486 ExtraCSSpill = UnspilledCSGPR;
3487
3488 // MachO's compact unwind format relies on all registers being stored in
3489 // pairs, so if we need to spill one extra for BigStack, then we need to
3490 // store the pair.
3491 if (producePairRegisters(MF)) {
3492 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
3493 // Failed to make a pair for compact unwind format, revert spilling.
3494 if (produceCompactUnwindFrame(MF)) {
3495 SavedRegs.reset(UnspilledCSGPR);
3496 ExtraCSSpill = AArch64::NoRegister;
3497 }
3498 } else
3499 SavedRegs.set(UnspilledCSGPRPaired);
3500 }
3501 }
3502
3503 // If we didn't find an extra callee-saved register to spill, create
3504 // an emergency spill slot.
3505 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3507 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3508 unsigned Size = TRI->getSpillSize(RC);
3509 Align Alignment = TRI->getSpillAlign(RC);
3510 int FI = MFI.CreateStackObject(Size, Alignment, false);
3512 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3513 << " as the emergency spill slot.\n");
3514 }
3515 }
3516
3517 // Adding the size of additional 64bit GPR saves.
3518 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3519
3520 // A Swift asynchronous context extends the frame record with a pointer
3521 // directly before FP.
3522 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3523 CSStackSize += 8;
3524
3525 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3526 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3527 << EstimatedStackSize + AlignedCSStackSize
3528 << " bytes.\n");
3529
3531 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3532 "Should not invalidate callee saved info");
3533
3534 // Round up to register pair alignment to avoid additional SP adjustment
3535 // instructions.
3536 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3537 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3538 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3539}
3540
3542 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3543 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3544 unsigned &MaxCSFrameIndex) const {
3545 bool NeedsWinCFI = needsWinCFI(MF);
3546 // To match the canonical windows frame layout, reverse the list of
3547 // callee saved registers to get them laid out by PrologEpilogInserter
3548 // in the right order. (PrologEpilogInserter allocates stack objects top
3549 // down. Windows canonical prologs store higher numbered registers at
3550 // the top, thus have the CSI array start from the highest registers.)
3551 if (NeedsWinCFI)
3552 std::reverse(CSI.begin(), CSI.end());
3553
3554 if (CSI.empty())
3555 return true; // Early exit if no callee saved registers are modified!
3556
3557 // Now that we know which registers need to be saved and restored, allocate
3558 // stack slots for them.
3559 MachineFrameInfo &MFI = MF.getFrameInfo();
3560 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3561
3562 bool UsesWinAAPCS = isTargetWindows(MF);
3563 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3564 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3565 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3566 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3567 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3568 }
3569
3570 for (auto &CS : CSI) {
3571 Register Reg = CS.getReg();
3572 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3573
3574 unsigned Size = RegInfo->getSpillSize(*RC);
3575 Align Alignment(RegInfo->getSpillAlign(*RC));
3576 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3577 CS.setFrameIdx(FrameIdx);
3578
3579 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3580 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3581
3582 // Grab 8 bytes below FP for the extended asynchronous frame info.
3583 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3584 Reg == AArch64::FP) {
3585 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3586 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3587 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3588 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3589 }
3590 }
3591 return true;
3592}
3593
3595 const MachineFunction &MF) const {
3597 // If the function has streaming-mode changes, don't scavenge a
3598 // spillslot in the callee-save area, as that might require an
3599 // 'addvl' in the streaming-mode-changing call-sequence when the
3600 // function doesn't use a FP.
3601 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
3602 return false;
3603 return AFI->hasCalleeSaveStackFreeSpace();
3604}
3605
3606/// returns true if there are any SVE callee saves.
3608 int &Min, int &Max) {
3609 Min = std::numeric_limits<int>::max();
3610 Max = std::numeric_limits<int>::min();
3611
3612 if (!MFI.isCalleeSavedInfoValid())
3613 return false;
3614
3615 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
3616 for (auto &CS : CSI) {
3617 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
3618 AArch64::PPRRegClass.contains(CS.getReg())) {
3619 assert((Max == std::numeric_limits<int>::min() ||
3620 Max + 1 == CS.getFrameIdx()) &&
3621 "SVE CalleeSaves are not consecutive");
3622
3623 Min = std::min(Min, CS.getFrameIdx());
3624 Max = std::max(Max, CS.getFrameIdx());
3625 }
3626 }
3627 return Min != std::numeric_limits<int>::max();
3628}
3629
3630// Process all the SVE stack objects and determine offsets for each
3631// object. If AssignOffsets is true, the offsets get assigned.
3632// Fills in the first and last callee-saved frame indices into
3633// Min/MaxCSFrameIndex, respectively.
3634// Returns the size of the stack.
3636 int &MinCSFrameIndex,
3637 int &MaxCSFrameIndex,
3638 bool AssignOffsets) {
3639#ifndef NDEBUG
3640 // First process all fixed stack objects.
3641 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
3643 "SVE vectors should never be passed on the stack by value, only by "
3644 "reference.");
3645#endif
3646
3647 auto Assign = [&MFI](int FI, int64_t Offset) {
3648 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
3649 MFI.setObjectOffset(FI, Offset);
3650 };
3651
3652 int64_t Offset = 0;
3653
3654 // Then process all callee saved slots.
3655 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
3656 // Assign offsets to the callee save slots.
3657 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
3658 Offset += MFI.getObjectSize(I);
3660 if (AssignOffsets)
3661 Assign(I, -Offset);
3662 }
3663 }
3664
3665 // Ensure that the Callee-save area is aligned to 16bytes.
3666 Offset = alignTo(Offset, Align(16U));
3667
3668 // Create a buffer of SVE objects to allocate and sort it.
3669 SmallVector<int, 8> ObjectsToAllocate;
3670 // If we have a stack protector, and we've previously decided that we have SVE
3671 // objects on the stack and thus need it to go in the SVE stack area, then it
3672 // needs to go first.
3673 int StackProtectorFI = -1;
3674 if (MFI.hasStackProtectorIndex()) {
3675 StackProtectorFI = MFI.getStackProtectorIndex();
3676 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
3677 ObjectsToAllocate.push_back(StackProtectorFI);
3678 }
3679 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
3680 unsigned StackID = MFI.getStackID(I);
3681 if (StackID != TargetStackID::ScalableVector)
3682 continue;
3683 if (I == StackProtectorFI)
3684 continue;
3685 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
3686 continue;
3687 if (MFI.isDeadObjectIndex(I))
3688 continue;
3689
3690 ObjectsToAllocate.push_back(I);
3691 }
3692
3693 // Allocate all SVE locals and spills
3694 for (unsigned FI : ObjectsToAllocate) {
3695 Align Alignment = MFI.getObjectAlign(FI);
3696 // FIXME: Given that the length of SVE vectors is not necessarily a power of
3697 // two, we'd need to align every object dynamically at runtime if the
3698 // alignment is larger than 16. This is not yet supported.
3699 if (Alignment > Align(16))
3701 "Alignment of scalable vectors > 16 bytes is not yet supported");
3702
3703 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
3704 if (AssignOffsets)
3705 Assign(FI, -Offset);
3706 }
3707
3708 return Offset;
3709}
3710
3711int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
3712 MachineFrameInfo &MFI) const {
3713 int MinCSFrameIndex, MaxCSFrameIndex;
3714 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
3715}
3716
3717int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
3718 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
3719 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
3720 true);
3721}
3722
3724 MachineFunction &MF, RegScavenger *RS) const {
3725 MachineFrameInfo &MFI = MF.getFrameInfo();
3726
3728 "Upwards growing stack unsupported");
3729
3730 int MinCSFrameIndex, MaxCSFrameIndex;
3731 int64_t SVEStackSize =
3732 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
3733
3735 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
3736 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
3737
3738 // If this function isn't doing Win64-style C++ EH, we don't need to do
3739 // anything.
3740 if (!MF.hasEHFunclets())
3741 return;
3743 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3744
3745 MachineBasicBlock &MBB = MF.front();
3746 auto MBBI = MBB.begin();
3747 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3748 ++MBBI;
3749
3750 // Create an UnwindHelp object.
3751 // The UnwindHelp object is allocated at the start of the fixed object area
3752 int64_t FixedObject =
3753 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
3754 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
3755 /*SPOffset*/ -FixedObject,
3756 /*IsImmutable=*/false);
3757 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3758
3759 // We need to store -2 into the UnwindHelp object at the start of the
3760 // function.
3761 DebugLoc DL;
3763 RS->backward(MBBI);
3764 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3765 assert(DstReg && "There must be a free register after frame setup");
3766 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3767 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3768 .addReg(DstReg, getKillRegState(true))
3769 .addFrameIndex(UnwindHelpFI)
3770 .addImm(0);
3771}
3772
3773namespace {
3774struct TagStoreInstr {
3776 int64_t Offset, Size;
3777 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3778 : MI(MI), Offset(Offset), Size(Size) {}
3779};
3780
3781class TagStoreEdit {
3782 MachineFunction *MF;
3785 // Tag store instructions that are being replaced.
3787 // Combined memref arguments of the above instructions.
3789
3790 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3791 // FrameRegOffset + Size) with the address tag of SP.
3792 Register FrameReg;
3793 StackOffset FrameRegOffset;
3794 int64_t Size;
3795 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3796 // end.
3797 std::optional<int64_t> FrameRegUpdate;
3798 // MIFlags for any FrameReg updating instructions.
3799 unsigned FrameRegUpdateFlags;
3800
3801 // Use zeroing instruction variants.
3802 bool ZeroData;
3803 DebugLoc DL;
3804
3805 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3806 void emitLoop(MachineBasicBlock::iterator InsertI);
3807
3808public:
3809 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3810 : MBB(MBB), ZeroData(ZeroData) {
3811 MF = MBB->getParent();
3812 MRI = &MF->getRegInfo();
3813 }
3814 // Add an instruction to be replaced. Instructions must be added in the
3815 // ascending order of Offset, and have to be adjacent.
3816 void addInstruction(TagStoreInstr I) {
3817 assert((TagStores.empty() ||
3818 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3819 "Non-adjacent tag store instructions.");
3820 TagStores.push_back(I);
3821 }
3822 void clear() { TagStores.clear(); }
3823 // Emit equivalent code at the given location, and erase the current set of
3824 // instructions. May skip if the replacement is not profitable. May invalidate
3825 // the input iterator and replace it with a valid one.
3826 void emitCode(MachineBasicBlock::iterator &InsertI,
3827 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3828};
3829
3830void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3831 const AArch64InstrInfo *TII =
3832 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3833
3834 const int64_t kMinOffset = -256 * 16;
3835 const int64_t kMaxOffset = 255 * 16;
3836
3837 Register BaseReg = FrameReg;
3838 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3839 if (BaseRegOffsetBytes < kMinOffset ||
3840 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3841 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3842 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3843 // is required for the offset of ST2G.
3844 BaseRegOffsetBytes % 16 != 0) {
3845 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3846 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3847 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3848 BaseReg = ScratchReg;
3849 BaseRegOffsetBytes = 0;
3850 }
3851
3852 MachineInstr *LastI = nullptr;
3853 while (Size) {
3854 int64_t InstrSize = (Size > 16) ? 32 : 16;
3855 unsigned Opcode =
3856 InstrSize == 16
3857 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3858 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3859 assert(BaseRegOffsetBytes % 16 == 0);
3860 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3861 .addReg(AArch64::SP)
3862 .addReg(BaseReg)
3863 .addImm(BaseRegOffsetBytes / 16)
3864 .setMemRefs(CombinedMemRefs);
3865 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3866 // final SP adjustment in the epilogue.
3867 if (BaseRegOffsetBytes == 0)
3868 LastI = I;
3869 BaseRegOffsetBytes += InstrSize;
3870 Size -= InstrSize;
3871 }
3872
3873 if (LastI)
3874 MBB->splice(InsertI, MBB, LastI);
3875}
3876
3877void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3878 const AArch64InstrInfo *TII =
3879 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3880
3881 Register BaseReg = FrameRegUpdate
3882 ? FrameReg
3883 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3884 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3885
3886 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3887
3888 int64_t LoopSize = Size;
3889 // If the loop size is not a multiple of 32, split off one 16-byte store at
3890 // the end to fold BaseReg update into.
3891 if (FrameRegUpdate && *FrameRegUpdate)
3892 LoopSize -= LoopSize % 32;
3893 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3894 TII->get(ZeroData ? AArch64::STZGloop_wback
3895 : AArch64::STGloop_wback))
3896 .addDef(SizeReg)
3897 .addDef(BaseReg)
3898 .addImm(LoopSize)
3899 .addReg(BaseReg)
3900 .setMemRefs(CombinedMemRefs);
3901 if (FrameRegUpdate)
3902 LoopI->setFlags(FrameRegUpdateFlags);
3903
3904 int64_t ExtraBaseRegUpdate =
3905 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3906 if (LoopSize < Size) {
3907 assert(FrameRegUpdate);
3908 assert(Size - LoopSize == 16);
3909 // Tag 16 more bytes at BaseReg and update BaseReg.
3910 BuildMI(*MBB, InsertI, DL,
3911 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3912 .addDef(BaseReg)
3913 .addReg(BaseReg)
3914 .addReg(BaseReg)
3915 .addImm(1 + ExtraBaseRegUpdate / 16)
3916 .setMemRefs(CombinedMemRefs)
3917 .setMIFlags(FrameRegUpdateFlags);
3918 } else if (ExtraBaseRegUpdate) {
3919 // Update BaseReg.
3920 BuildMI(
3921 *MBB, InsertI, DL,
3922 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3923 .addDef(BaseReg)
3924 .addReg(BaseReg)
3925 .addImm(std::abs(ExtraBaseRegUpdate))
3926 .addImm(0)
3927 .setMIFlags(FrameRegUpdateFlags);
3928 }
3929}
3930
3931// Check if *II is a register update that can be merged into STGloop that ends
3932// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3933// end of the loop.
3934bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3935 int64_t Size, int64_t *TotalOffset) {
3936 MachineInstr &MI = *II;
3937 if ((MI.getOpcode() == AArch64::ADDXri ||
3938 MI.getOpcode() == AArch64::SUBXri) &&
3939 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3940 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3941 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3942 if (MI.getOpcode() == AArch64::SUBXri)
3943 Offset = -Offset;
3944 int64_t AbsPostOffset = std::abs(Offset - Size);
3945 const int64_t kMaxOffset =
3946 0xFFF; // Max encoding for unshifted ADDXri / SUBXri
3947 if (AbsPostOffset <= kMaxOffset && AbsPostOffset % 16 == 0) {
3948 *TotalOffset = Offset;
3949 return true;
3950 }
3951 }
3952 return false;
3953}
3954
3955void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3957 MemRefs.clear();
3958 for (auto &TS : TSE) {
3959 MachineInstr *MI = TS.MI;
3960 // An instruction without memory operands may access anything. Be
3961 // conservative and return an empty list.
3962 if (MI->memoperands_empty()) {
3963 MemRefs.clear();
3964 return;
3965 }
3966 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3967 }
3968}
3969
3970void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3971 const AArch64FrameLowering *TFI,
3972 bool TryMergeSPUpdate) {
3973 if (TagStores.empty())
3974 return;
3975 TagStoreInstr &FirstTagStore = TagStores[0];
3976 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3977 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3978 DL = TagStores[0].MI->getDebugLoc();
3979
3980 Register Reg;
3981 FrameRegOffset = TFI->resolveFrameOffsetReference(
3982 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
3983 /*PreferFP=*/false, /*ForSimm=*/true);
3984 FrameReg = Reg;
3985 FrameRegUpdate = std::nullopt;
3986
3987 mergeMemRefs(TagStores, CombinedMemRefs);
3988
3989 LLVM_DEBUG(dbgs() << "Replacing adjacent STG instructions:\n";
3990 for (const auto &Instr
3991 : TagStores) { dbgs() << " " << *Instr.MI; });
3992
3993 // Size threshold where a loop becomes shorter than a linear sequence of
3994 // tagging instructions.
3995 const int kSetTagLoopThreshold = 176;
3996 if (Size < kSetTagLoopThreshold) {
3997 if (TagStores.size() < 2)
3998 return;
3999 emitUnrolled(InsertI);
4000 } else {
4001 MachineInstr *UpdateInstr = nullptr;
4002 int64_t TotalOffset = 0;
4003 if (TryMergeSPUpdate) {
4004 // See if we can merge base register update into the STGloop.
4005 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
4006 // but STGloop is way too unusual for that, and also it only
4007 // realistically happens in function epilogue. Also, STGloop is expanded
4008 // before that pass.
4009 if (InsertI != MBB->end() &&
4010 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
4011 &TotalOffset)) {
4012 UpdateInstr = &*InsertI++;
4013 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
4014 << *UpdateInstr);
4015 }
4016 }
4017
4018 if (!UpdateInstr && TagStores.size() < 2)
4019 return;
4020
4021 if (UpdateInstr) {
4022 FrameRegUpdate = TotalOffset;
4023 FrameRegUpdateFlags = UpdateInstr->getFlags();
4024 }
4025 emitLoop(InsertI);
4026 if (UpdateInstr)
4027 UpdateInstr->eraseFromParent();
4028 }
4029
4030 for (auto &TS : TagStores)
4031 TS.MI->eraseFromParent();
4032}
4033
4034bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
4035 int64_t &Size, bool &ZeroData) {
4036 MachineFunction &MF = *MI.getParent()->getParent();
4037 const MachineFrameInfo &MFI = MF.getFrameInfo();
4038
4039 unsigned Opcode = MI.getOpcode();
4040 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
4041 Opcode == AArch64::STZ2Gi);
4042
4043 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
4044 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
4045 return false;
4046 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
4047 return false;
4048 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
4049 Size = MI.getOperand(2).getImm();
4050 return true;
4051 }
4052
4053 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
4054 Size = 16;
4055 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
4056 Size = 32;
4057 else
4058 return false;
4059
4060 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
4061 return false;
4062
4063 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
4064 16 * MI.getOperand(2).getImm();
4065 return true;
4066}
4067
4068// Detect a run of memory tagging instructions for adjacent stack frame slots,
4069// and replace them with a shorter instruction sequence:
4070// * replace STG + STG with ST2G
4071// * replace STGloop + STGloop with STGloop
4072// This code needs to run when stack slot offsets are already known, but before
4073// FrameIndex operands in STG instructions are eliminated.
4075 const AArch64FrameLowering *TFI,
4076 RegScavenger *RS) {
4077 bool FirstZeroData;
4078 int64_t Size, Offset;
4079 MachineInstr &MI = *II;
4080 MachineBasicBlock *MBB = MI.getParent();
4081 MachineBasicBlock::iterator NextI = ++II;
4082 if (&MI == &MBB->instr_back())
4083 return II;
4084 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
4085 return II;
4086
4088 Instrs.emplace_back(&MI, Offset, Size);
4089
4090 constexpr int kScanLimit = 10;
4091 int Count = 0;
4093 NextI != E && Count < kScanLimit; ++NextI) {
4094 MachineInstr &MI = *NextI;
4095 bool ZeroData;
4096 int64_t Size, Offset;
4097 // Collect instructions that update memory tags with a FrameIndex operand
4098 // and (when applicable) constant size, and whose output registers are dead
4099 // (the latter is almost always the case in practice). Since these
4100 // instructions effectively have no inputs or outputs, we are free to skip
4101 // any non-aliasing instructions in between without tracking used registers.
4102 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
4103 if (ZeroData != FirstZeroData)
4104 break;
4105 Instrs.emplace_back(&MI, Offset, Size);
4106 continue;
4107 }
4108
4109 // Only count non-transient, non-tagging instructions toward the scan
4110 // limit.
4111 if (!MI.isTransient())
4112 ++Count;
4113
4114 // Just in case, stop before the epilogue code starts.
4115 if (MI.getFlag(MachineInstr::FrameSetup) ||
4117 break;
4118
4119 // Reject anything that may alias the collected instructions.
4120 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
4121 break;
4122 }
4123
4124 // New code will be inserted after the last tagging instruction we've found.
4125 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
4126
4127 // All the gathered stack tag instructions are merged and placed after
4128 // last tag store in the list. The check should be made if the nzcv
4129 // flag is live at the point where we are trying to insert. Otherwise
4130 // the nzcv flag might get clobbered if any stg loops are present.
4131
4132 // FIXME : This approach of bailing out from merge is conservative in
4133 // some ways like even if stg loops are not present after merge the
4134 // insert list, this liveness check is done (which is not needed).
4136 LiveRegs.addLiveOuts(*MBB);
4137 for (auto I = MBB->rbegin();; ++I) {
4138 MachineInstr &MI = *I;
4139 if (MI == InsertI)
4140 break;
4141 LiveRegs.stepBackward(*I);
4142 }
4143 InsertI++;
4144 if (LiveRegs.contains(AArch64::NZCV))
4145 return InsertI;
4146
4147 llvm::stable_sort(Instrs,
4148 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
4149 return Left.Offset < Right.Offset;
4150 });
4151
4152 // Make sure that we don't have any overlapping stores.
4153 int64_t CurOffset = Instrs[0].Offset;
4154 for (auto &Instr : Instrs) {
4155 if (CurOffset > Instr.Offset)
4156 return NextI;
4157 CurOffset = Instr.Offset + Instr.Size;
4158 }
4159
4160 // Find contiguous runs of tagged memory and emit shorter instruction
4161 // sequencies for them when possible.
4162 TagStoreEdit TSE(MBB, FirstZeroData);
4163 std::optional<int64_t> EndOffset;
4164 for (auto &Instr : Instrs) {
4165 if (EndOffset && *EndOffset != Instr.Offset) {
4166 // Found a gap.
4167 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
4168 TSE.clear();
4169 }
4170
4171 TSE.addInstruction(Instr);
4172 EndOffset = Instr.Offset + Instr.Size;
4173 }
4174
4175 const MachineFunction *MF = MBB->getParent();
4176 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
4177 TSE.emitCode(
4178 InsertI, TFI, /*TryMergeSPUpdate = */
4180
4181 return InsertI;
4182}
4183} // namespace
4184
4186 MachineFunction &MF, RegScavenger *RS = nullptr) const {
4188 for (auto &BB : MF)
4189 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();)
4190 II = tryMergeAdjacentSTG(II, this, RS);
4191}
4192
4193/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
4194/// before the update. This is easily retrieved as it is exactly the offset
4195/// that is set in processFunctionBeforeFrameFinalized.
4197 const MachineFunction &MF, int FI, Register &FrameReg,
4198 bool IgnoreSPUpdates) const {
4199 const MachineFrameInfo &MFI = MF.getFrameInfo();
4200 if (IgnoreSPUpdates) {
4201 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
4202 << MFI.getObjectOffset(FI) << "\n");
4203 FrameReg = AArch64::SP;
4204 return StackOffset::getFixed(MFI.getObjectOffset(FI));
4205 }
4206
4207 // Go to common code if we cannot provide sp + offset.
4208 if (MFI.hasVarSizedObjects() ||
4211 return getFrameIndexReference(MF, FI, FrameReg);
4212
4213 FrameReg = AArch64::SP;
4214 return getStackOffset(MF, MFI.getObjectOffset(FI));
4215}
4216
4217/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
4218/// the parent's frame pointer
4220 const MachineFunction &MF) const {
4221 return 0;
4222}
4223
4224/// Funclets only need to account for space for the callee saved registers,
4225/// as the locals are accounted for in the parent's stack frame.
4227 const MachineFunction &MF) const {
4228 // This is the size of the pushed CSRs.
4229 unsigned CSSize =
4230 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
4231 // This is the amount of stack a funclet needs to allocate.
4232 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
4233 getStackAlign());
4234}
4235
4236namespace {
4237struct FrameObject {
4238 bool IsValid = false;
4239 // Index of the object in MFI.
4240 int ObjectIndex = 0;
4241 // Group ID this object belongs to.
4242 int GroupIndex = -1;
4243 // This object should be placed first (closest to SP).
4244 bool ObjectFirst = false;
4245 // This object's group (which always contains the object with
4246 // ObjectFirst==true) should be placed first.
4247 bool GroupFirst = false;
4248};
4249
4250class GroupBuilder {
4251 SmallVector<int, 8> CurrentMembers;
4252 int NextGroupIndex = 0;
4253 std::vector<FrameObject> &Objects;
4254
4255public:
4256 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
4257 void AddMember(int Index) { CurrentMembers.push_back(Index); }
4258 void EndCurrentGroup() {
4259 if (CurrentMembers.size() > 1) {
4260 // Create a new group with the current member list. This might remove them
4261 // from their pre-existing groups. That's OK, dealing with overlapping
4262 // groups is too hard and unlikely to make a difference.
4263 LLVM_DEBUG(dbgs() << "group:");
4264 for (int Index : CurrentMembers) {
4265 Objects[Index].GroupIndex = NextGroupIndex;
4266 LLVM_DEBUG(dbgs() << " " << Index);
4267 }
4268 LLVM_DEBUG(dbgs() << "\n");
4269 NextGroupIndex++;
4270 }
4271 CurrentMembers.clear();
4272 }
4273};
4274
4275bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
4276 // Objects at a lower index are closer to FP; objects at a higher index are
4277 // closer to SP.
4278 //
4279 // For consistency in our comparison, all invalid objects are placed
4280 // at the end. This also allows us to stop walking when we hit the
4281 // first invalid item after it's all sorted.
4282 //
4283 // The "first" object goes first (closest to SP), followed by the members of
4284 // the "first" group.
4285 //
4286 // The rest are sorted by the group index to keep the groups together.
4287 // Higher numbered groups are more likely to be around longer (i.e. untagged
4288 // in the function epilogue and not at some earlier point). Place them closer
4289 // to SP.
4290 //
4291 // If all else equal, sort by the object index to keep the objects in the
4292 // original order.
4293 return std::make_tuple(!A.IsValid, A.ObjectFirst, A.GroupFirst, A.GroupIndex,
4294 A.ObjectIndex) <
4295 std::make_tuple(!B.IsValid, B.ObjectFirst, B.GroupFirst, B.GroupIndex,
4296 B.ObjectIndex);
4297}
4298} // namespace
4299
4301 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
4302 if (!OrderFrameObjects || ObjectsToAllocate.empty())
4303 return;
4304
4305 const MachineFrameInfo &MFI = MF.getFrameInfo();
4306 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
4307 for (auto &Obj : ObjectsToAllocate) {
4308 FrameObjects[Obj].IsValid = true;
4309 FrameObjects[Obj].ObjectIndex = Obj;
4310 }
4311
4312 // Identify stack slots that are tagged at the same time.
4313 GroupBuilder GB(FrameObjects);
4314 for (auto &MBB : MF) {
4315 for (auto &MI : MBB) {
4316 if (MI.isDebugInstr())
4317 continue;
4318 int OpIndex;
4319 switch (MI.getOpcode()) {
4320 case AArch64::STGloop:
4321 case AArch64::STZGloop:
4322 OpIndex = 3;
4323 break;
4324 case AArch64::STGi:
4325 case AArch64::STZGi:
4326 case AArch64::ST2Gi:
4327 case AArch64::STZ2Gi:
4328 OpIndex = 1;
4329 break;
4330 default:
4331 OpIndex = -1;
4332 }
4333
4334 int TaggedFI = -1;
4335 if (OpIndex >= 0) {
4336 const MachineOperand &MO = MI.getOperand(OpIndex);
4337 if (MO.isFI()) {
4338 int FI = MO.getIndex();
4339 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
4340 FrameObjects[FI].IsValid)
4341 TaggedFI = FI;
4342 }
4343 }
4344
4345 // If this is a stack tagging instruction for a slot that is not part of a
4346 // group yet, either start a new group or add it to the current one.
4347 if (TaggedFI >= 0)
4348 GB.AddMember(TaggedFI);
4349 else
4350 GB.EndCurrentGroup();
4351 }
4352 // Groups should never span multiple basic blocks.
4353 GB.EndCurrentGroup();
4354 }
4355
4356 // If the function's tagged base pointer is pinned to a stack slot, we want to
4357 // put that slot first when possible. This will likely place it at SP + 0,
4358 // and save one instruction when generating the base pointer because IRG does
4359 // not allow an immediate offset.
4361 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
4362 if (TBPI) {
4363 FrameObjects[*TBPI].ObjectFirst = true;
4364 FrameObjects[*TBPI].GroupFirst = true;
4365 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
4366 if (FirstGroupIndex >= 0)
4367 for (FrameObject &Object : FrameObjects)
4368 if (Object.GroupIndex == FirstGroupIndex)
4369 Object.GroupFirst = true;
4370 }
4371
4372 llvm::stable_sort(FrameObjects, FrameObjectCompare);
4373
4374 int i = 0;
4375 for (auto &Obj : FrameObjects) {
4376 // All invalid items are sorted at the end, so it's safe to stop.
4377 if (!Obj.IsValid)
4378 break;
4379 ObjectsToAllocate[i++] = Obj.ObjectIndex;
4380 }
4381
4382 LLVM_DEBUG(dbgs() << "Final frame order:\n"; for (auto &Obj
4383 : FrameObjects) {
4384 if (!Obj.IsValid)
4385 break;
4386 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
4387 if (Obj.ObjectFirst)
4388 dbgs() << ", first";
4389 if (Obj.GroupFirst)
4390 dbgs() << ", group-first";
4391 dbgs() << "\n";
4392 });
4393}
4394
4395/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
4396/// least every ProbeSize bytes. Returns an iterator of the first instruction
4397/// after the loop. The difference between SP and TargetReg must be an exact
4398/// multiple of ProbeSize.
4400AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
4401 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
4402 Register TargetReg) const {
4404 MachineFunction &MF = *MBB.getParent();
4405 const AArch64InstrInfo *TII =
4406 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4408
4409 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
4411 MF.insert(MBBInsertPoint, LoopMBB);
4413 MF.insert(MBBInsertPoint, ExitMBB);
4414
4415 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
4416 // in SUB).
4417 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
4418 StackOffset::getFixed(-ProbeSize), TII,
4420 // STR XZR, [SP]
4421 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
4422 .addReg(AArch64::XZR)
4423 .addReg(AArch64::SP)
4424 .addImm(0)
4426 // CMP SP, TargetReg
4427 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
4428 AArch64::XZR)
4429 .addReg(AArch64::SP)
4430 .addReg(TargetReg)
4433 // B.CC Loop
4434 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
4436 .addMBB(LoopMBB)
4438
4439 LoopMBB->addSuccessor(ExitMBB);
4440 LoopMBB->addSuccessor(LoopMBB);
4441 // Synthesize the exit MBB.
4442 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
4444 MBB.addSuccessor(LoopMBB);
4445 // Update liveins.
4446 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
4447
4448 return ExitMBB->begin();
4449}
4450
4451void AArch64FrameLowering::inlineStackProbeFixed(
4452 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
4453 StackOffset CFAOffset) const {
4455 MachineFunction &MF = *MBB->getParent();
4456 const AArch64InstrInfo *TII =
4457 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4459 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
4460 bool HasFP = hasFP(MF);
4461
4462 DebugLoc DL;
4463 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
4464 int64_t NumBlocks = FrameSize / ProbeSize;
4465 int64_t ResidualSize = FrameSize % ProbeSize;
4466
4467 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
4468 << NumBlocks << " blocks of " << ProbeSize
4469 << " bytes, plus " << ResidualSize << " bytes\n");
4470
4471 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
4472 // ordinary loop.
4473 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
4474 for (int i = 0; i < NumBlocks; ++i) {
4475 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
4476 // encodable in a SUB).
4477 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4478 StackOffset::getFixed(-ProbeSize), TII,
4479 MachineInstr::FrameSetup, false, false, nullptr,
4480 EmitAsyncCFI && !HasFP, CFAOffset);
4481 CFAOffset += StackOffset::getFixed(ProbeSize);
4482 // STR XZR, [SP]
4483 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4484 .addReg(AArch64::XZR)
4485 .addReg(AArch64::SP)
4486 .addImm(0)
4488 }
4489 } else if (NumBlocks != 0) {
4490 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
4491 // encodable in ADD). ScrathReg may temporarily become the CFA register.
4492 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
4493 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
4494 MachineInstr::FrameSetup, false, false, nullptr,
4495 EmitAsyncCFI && !HasFP, CFAOffset);
4496 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
4497 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
4498 MBB = MBBI->getParent();
4499 if (EmitAsyncCFI && !HasFP) {
4500 // Set the CFA register back to SP.
4502 *MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
4503 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
4504 unsigned CFIIndex =
4506 BuildMI(*MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
4507 .addCFIIndex(CFIIndex)
4509 }
4510 }
4511
4512 if (ResidualSize != 0) {
4513 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
4514 // in SUB).
4515 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4516 StackOffset::getFixed(-ResidualSize), TII,
4517 MachineInstr::FrameSetup, false, false, nullptr,
4518 EmitAsyncCFI && !HasFP, CFAOffset);
4519 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
4520 // STR XZR, [SP]
4521 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4522 .addReg(AArch64::XZR)
4523 .addReg(AArch64::SP)
4524 .addImm(0)
4526 }
4527 }
4528}
4529
4530void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
4531 MachineBasicBlock &MBB) const {
4532 // Get the instructions that need to be replaced. We emit at most two of
4533 // these. Remember them in order to avoid complications coming from the need
4534 // to traverse the block while potentially creating more blocks.
4536 for (MachineInstr &MI : MBB)
4537 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
4538 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
4539 ToReplace.push_back(&MI);
4540
4541 for (MachineInstr *MI : ToReplace) {
4542 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
4543 Register ScratchReg = MI->getOperand(0).getReg();
4544 int64_t FrameSize = MI->getOperand(1).getImm();
4545 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
4546 MI->getOperand(3).getImm());
4547 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
4548 CFAOffset);
4549 } else {
4550 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
4551 "Stack probe pseudo-instruction expected");
4552 const AArch64InstrInfo *TII =
4553 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
4554 Register TargetReg = MI->getOperand(0).getReg();
4555 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
4556 }
4557 MI->eraseFromParent();
4558 }
4559}
unsigned const MachineRegisterInfo * MRI
#define Success
for(const MachineOperand &MO :llvm::drop_begin(OldMI.operands(), Desc.getNumOperands()))
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned FixedObject)
static bool needsWinCFI(const MachineFunction &MF)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static Register findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
static void getLivePhysRegsUpTo(MachineInstr &MI, const TargetRegisterInfo &TRI, LivePhysRegs &LiveRegs)
Collect live registers from the end of MI's parent up to (including) MI in LiveRegs.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
unsigned RegSize
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
static const int kSetTagLoopThreshold
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
static void clear(coro::Shape &Shape)
Definition: Coroutines.cpp:148
#define LLVM_DEBUG(X)
Definition: Debug.h:101
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
if(VerifyEach)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:167
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:469
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFP(const MachineFunction &MF) const override
hasFP - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
bool enableCFIFixup(MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setPredicateRegForFillSpill(unsigned Reg)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
const Triple & getTargetTriple() const
bool isCallingConvWin64(CallingConv::ID CC) const
const char * getChkStkName() const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
bool hasAttrSomewhere(Attribute::AttrKind Kind, unsigned *Index=nullptr) const
Return true if the specified attribute is set for at least one parameter or for the return value.
bool test(unsigned Idx) const
Definition: BitVector.h:461
BitVector & reset()
Definition: BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:685
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:682
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:264
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:340
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:675
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc) const override
Emit instructions to copy a pair of physical registers.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:52
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void stepBackward(const MachineInstr &MI)
Simulates liveness when stepping backwards over an instruction(bundle).
void removeReg(MCPhysReg Reg)
Removes a physical register, all its sub-registers, and all its super-registers from the set.
Definition: LivePhysRegs.h:92
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds all live-out registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:83
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:799
static MCCFIInstruction createDefCfaRegister(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_def_cfa_register modifies a rule for computing CFA.
Definition: MCDwarf.h:548
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int Offset, SMLoc Loc={})
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:583
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int Offset, SMLoc Loc={})
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:556
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:616
static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:609
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int Offset, SMLoc Loc={})
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:541
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, SMLoc Loc={}, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:647
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:630
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:322
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:40
void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
instr_iterator instr_begin()
const BasicBlock * getBasicBlock() const
Return the LLVM basic block that this instance corresponded to originally.
bool isLiveIn(MCPhysReg Reg, LaneBitmask LaneMask=LaneBitmask::getAll()) const
Return true if the specified register is in the live in set.
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
instr_iterator instr_end()
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
uint8_t getStackID(int ObjectIdx) const
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
const LLVMTargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
MachineModuleInfo & getMMI() const
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:69
void setFlags(unsigned flags)
Definition: MachineInstr.h:404
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:386
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
This class contains meta information specific to a module.
const MCContext & getContext() const
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:307
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:586
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:950
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:696
void push_back(const T &Elt)
Definition: SmallVector.h:426
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1209
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:33
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:49
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:52
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:44
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:43
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:42
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:50
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetRegisterInfo * getRegisterInfo() const
getRegisterInfo - If register information is available, return it.
virtual const TargetInstrInfo * getInstrInfo() const
StringRef getArchName() const
Get the architecture (first) component of the triple.
Definition: Triple.cpp:1299
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:342
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
self_iterator getIterator()
Definition: ilist_node.h:109
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ MO_GOT
MO_GOT - This flag indicates that a symbol operand represents the address of the GOT entry for the sy...
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
static unsigned getShifterImm(AArch64_AM::ShiftExtendType ST, unsigned Imm)
getShifterImm - Encode the shift type and amount: imm: 6-bit shift amount shifter: 000 ==> lsl 001 ==...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
Definition: CallingConv.h:224
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition: CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition: CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition: CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition: CallingConv.h:66
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
Definition: CallingConv.h:159
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition: CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Dead
Unused definition.
@ Define
Register definition.
@ Kill
The last use of a register.
Reg
All possible values of the reg field in the ModR/M byte.
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:450
NodeAddr< InstrNode * > Instr
Definition: RDFGraph.h:389
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:456
void stable_sort(R &&Range)
Definition: STLExtras.h:1995
MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg, unsigned Reg, const StackOffset &Offset, bool LastAdjustmentWasScalable=true)
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition: ScopeExit.h:59
MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg, const StackOffset &OffsetFromDefCFA)
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
unsigned getBLRCallOpcode(const MachineFunction &MF)
Return opcode to be used for indirect calls.
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:419
@ Always
Always set the bit.
@ DeploymentBased
Determine whether to set the bit statically or dynamically based on the deployment target.
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:156
EHPersonality classifyEHPersonality(const Value *Pers)
See if the given exception handling personality function is one that we understand.
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
bool isAsynchronousEHPersonality(EHPersonality Pers)
Returns true if this personality function catches asynchronous exceptions.
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
Definition: LivePhysRegs.h:215
Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition: BitVector.h:860
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
Description of the encoding of one expression Op.
static MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.