LLVM 18.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | |
56// | callee-saved fp/simd/SVE regs |
57// | |
58// |-----------------------------------|
59// | |
60// | SVE stack objects |
61// | |
62// |-----------------------------------|
63// |.empty.space.to.make.part.below....|
64// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
65// |.the.standard.16-byte.alignment....| compile time; if present)
66// |-----------------------------------|
67// | |
68// | local variables of fixed size |
69// | including spill slots |
70// |-----------------------------------| <- bp(not defined by ABI,
71// |.variable-sized.local.variables....| LLVM chooses X19)
72// |.(VLAs)............................| (size of this area is unknown at
73// |...................................| compile time)
74// |-----------------------------------| <- sp
75// | | Lower address
76//
77//
78// To access the data in a frame, at-compile time, a constant offset must be
79// computable from one of the pointers (fp, bp, sp) to access it. The size
80// of the areas with a dotted background cannot be computed at compile-time
81// if they are present, making it required to have all three of fp, bp and
82// sp to be set up to be able to access all contents in the frame areas,
83// assuming all of the frame areas are non-empty.
84//
85// For most functions, some of the frame areas are empty. For those functions,
86// it may not be necessary to set up fp or bp:
87// * A base pointer is definitely needed when there are both VLAs and local
88// variables with more-than-default alignment requirements.
89// * A frame pointer is definitely needed when there are local variables with
90// more-than-default alignment requirements.
91//
92// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
93// callee-saved area, since the unwind encoding does not allow for encoding
94// this dynamically and existing tools depend on this layout. For other
95// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
96// area to allow SVE stack objects (allocated directly below the callee-saves,
97// if available) to be accessed directly from the framepointer.
98// The SVE spill/fill instructions have VL-scaled addressing modes such
99// as:
100// ldr z8, [fp, #-7 mul vl]
101// For SVE the size of the vector length (VL) is not known at compile-time, so
102// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
103// layout, we don't need to add an unscaled offset to the framepointer before
104// accessing the SVE object in the frame.
105//
106// In some cases when a base pointer is not strictly needed, it is generated
107// anyway when offsets from the frame pointer to access local variables become
108// so large that the offset can't be encoded in the immediate fields of loads
109// or stores.
110//
111// Outgoing function arguments must be at the bottom of the stack frame when
112// calling another function. If we do not have variable-sized stack objects, we
113// can allocate a "reserved call frame" area at the bottom of the local
114// variable area, large enough for all outgoing calls. If we do have VLAs, then
115// the stack pointer must be decremented and incremented around each call to
116// make space for the arguments below the VLAs.
117//
118// FIXME: also explain the redzone concept.
119//
120// An example of the prologue:
121//
122// .globl __foo
123// .align 2
124// __foo:
125// Ltmp0:
126// .cfi_startproc
127// .cfi_personality 155, ___gxx_personality_v0
128// Leh_func_begin:
129// .cfi_lsda 16, Lexception33
130//
131// stp xa,bx, [sp, -#offset]!
132// ...
133// stp x28, x27, [sp, #offset-32]
134// stp fp, lr, [sp, #offset-16]
135// add fp, sp, #offset - 16
136// sub sp, sp, #1360
137//
138// The Stack:
139// +-------------------------------------------+
140// 10000 | ........ | ........ | ........ | ........ |
141// 10004 | ........ | ........ | ........ | ........ |
142// +-------------------------------------------+
143// 10008 | ........ | ........ | ........ | ........ |
144// 1000c | ........ | ........ | ........ | ........ |
145// +===========================================+
146// 10010 | X28 Register |
147// 10014 | X28 Register |
148// +-------------------------------------------+
149// 10018 | X27 Register |
150// 1001c | X27 Register |
151// +===========================================+
152// 10020 | Frame Pointer |
153// 10024 | Frame Pointer |
154// +-------------------------------------------+
155// 10028 | Link Register |
156// 1002c | Link Register |
157// +===========================================+
158// 10030 | ........ | ........ | ........ | ........ |
159// 10034 | ........ | ........ | ........ | ........ |
160// +-------------------------------------------+
161// 10038 | ........ | ........ | ........ | ........ |
162// 1003c | ........ | ........ | ........ | ........ |
163// +-------------------------------------------+
164//
165// [sp] = 10030 :: >>initial value<<
166// sp = 10020 :: stp fp, lr, [sp, #-16]!
167// fp = sp == 10020 :: mov fp, sp
168// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
169// sp == 10010 :: >>final value<<
170//
171// The frame pointer (w29) points to address 10020. If we use an offset of
172// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
173// for w27, and -32 for w28:
174//
175// Ltmp1:
176// .cfi_def_cfa w29, 16
177// Ltmp2:
178// .cfi_offset w30, -8
179// Ltmp3:
180// .cfi_offset w29, -16
181// Ltmp4:
182// .cfi_offset w27, -24
183// Ltmp5:
184// .cfi_offset w28, -32
185//
186//===----------------------------------------------------------------------===//
187
188#include "AArch64FrameLowering.h"
189#include "AArch64InstrInfo.h"
191#include "AArch64RegisterInfo.h"
192#include "AArch64Subtarget.h"
193#include "AArch64TargetMachine.h"
196#include "llvm/ADT/ScopeExit.h"
197#include "llvm/ADT/SmallVector.h"
198#include "llvm/ADT/Statistic.h"
214#include "llvm/IR/Attributes.h"
215#include "llvm/IR/CallingConv.h"
216#include "llvm/IR/DataLayout.h"
217#include "llvm/IR/DebugLoc.h"
218#include "llvm/IR/Function.h"
219#include "llvm/MC/MCAsmInfo.h"
220#include "llvm/MC/MCDwarf.h"
222#include "llvm/Support/Debug.h"
228#include <cassert>
229#include <cstdint>
230#include <iterator>
231#include <optional>
232#include <vector>
233
234using namespace llvm;
235
236#define DEBUG_TYPE "frame-info"
237
238static cl::opt<bool> EnableRedZone("aarch64-redzone",
239 cl::desc("enable use of redzone on AArch64"),
240 cl::init(false), cl::Hidden);
241
242static cl::opt<bool>
243 ReverseCSRRestoreSeq("reverse-csr-restore-seq",
244 cl::desc("reverse the CSR restore sequence"),
245 cl::init(false), cl::Hidden);
246
248 "stack-tagging-merge-settag",
249 cl::desc("merge settag instruction in function epilog"), cl::init(true),
250 cl::Hidden);
251
252static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
253 cl::desc("sort stack allocations"),
254 cl::init(true), cl::Hidden);
255
257 "homogeneous-prolog-epilog", cl::Hidden,
258 cl::desc("Emit homogeneous prologue and epilogue for the size "
259 "optimization (default = off)"));
260
261STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
262
263/// Returns how much of the incoming argument stack area (in bytes) we should
264/// clean up in an epilogue. For the C calling convention this will be 0, for
265/// guaranteed tail call conventions it can be positive (a normal return or a
266/// tail call to a function that uses less stack space for arguments) or
267/// negative (for a tail call to a function that needs more stack space than us
268/// for arguments).
273 bool IsTailCallReturn = (MBB.end() != MBBI)
275 : false;
276
277 int64_t ArgumentPopSize = 0;
278 if (IsTailCallReturn) {
279 MachineOperand &StackAdjust = MBBI->getOperand(1);
280
281 // For a tail-call in a callee-pops-arguments environment, some or all of
282 // the stack may actually be in use for the call's arguments, this is
283 // calculated during LowerCall and consumed here...
284 ArgumentPopSize = StackAdjust.getImm();
285 } else {
286 // ... otherwise the amount to pop is *all* of the argument space,
287 // conveniently stored in the MachineFunctionInfo by
288 // LowerFormalArguments. This will, of course, be zero for the C calling
289 // convention.
290 ArgumentPopSize = AFI->getArgumentStackToRestore();
291 }
292
293 return ArgumentPopSize;
294}
295
297static bool needsWinCFI(const MachineFunction &MF);
300
301/// Returns true if a homogeneous prolog or epilog code can be emitted
302/// for the size optimization. If possible, a frame helper call is injected.
303/// When Exit block is given, this check is for epilog.
304bool AArch64FrameLowering::homogeneousPrologEpilog(
305 MachineFunction &MF, MachineBasicBlock *Exit) const {
306 if (!MF.getFunction().hasMinSize())
307 return false;
309 return false;
311 return false;
312 if (EnableRedZone)
313 return false;
314
315 // TODO: Window is supported yet.
316 if (needsWinCFI(MF))
317 return false;
318 // TODO: SVE is not supported yet.
319 if (getSVEStackSize(MF))
320 return false;
321
322 // Bail on stack adjustment needed on return for simplicity.
323 const MachineFrameInfo &MFI = MF.getFrameInfo();
325 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
326 return false;
327 if (Exit && getArgumentStackToRestore(MF, *Exit))
328 return false;
329
330 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
331 if (AFI->hasSwiftAsyncContext())
332 return false;
333
334 // If there are an odd number of GPRs before LR and FP in the CSRs list,
335 // they will not be paired into one RegPairInfo, which is incompatible with
336 // the assumption made by the homogeneous prolog epilog pass.
337 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
338 unsigned NumGPRs = 0;
339 for (unsigned I = 0; CSRegs[I]; ++I) {
340 Register Reg = CSRegs[I];
341 if (Reg == AArch64::LR) {
342 assert(CSRegs[I + 1] == AArch64::FP);
343 if (NumGPRs % 2 != 0)
344 return false;
345 break;
346 }
347 if (AArch64::GPR64RegClass.contains(Reg))
348 ++NumGPRs;
349 }
350
351 return true;
352}
353
354/// Returns true if CSRs should be paired.
355bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
356 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
357}
358
359/// This is the biggest offset to the stack pointer we can encode in aarch64
360/// instructions (without using a separate calculation and a temp register).
361/// Note that the exception here are vector stores/loads which cannot encode any
362/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
363static const unsigned DefaultSafeSPDisplacement = 255;
364
365/// Look at each instruction that references stack frames and return the stack
366/// size limit beyond which some of these instructions will require a scratch
367/// register during their expansion later.
369 // FIXME: For now, just conservatively guestimate based on unscaled indexing
370 // range. We'll end up allocating an unnecessary spill slot a lot, but
371 // realistically that's not a big deal at this stage of the game.
372 for (MachineBasicBlock &MBB : MF) {
373 for (MachineInstr &MI : MBB) {
374 if (MI.isDebugInstr() || MI.isPseudo() ||
375 MI.getOpcode() == AArch64::ADDXri ||
376 MI.getOpcode() == AArch64::ADDSXri)
377 continue;
378
379 for (const MachineOperand &MO : MI.operands()) {
380 if (!MO.isFI())
381 continue;
382
384 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
386 return 0;
387 }
388 }
389 }
391}
392
396}
397
398/// Returns the size of the fixed object area (allocated next to sp on entry)
399/// On Win64 this may include a var args area and an UnwindHelp object for EH.
400static unsigned getFixedObjectSize(const MachineFunction &MF,
401 const AArch64FunctionInfo *AFI, bool IsWin64,
402 bool IsFunclet) {
403 if (!IsWin64 || IsFunclet) {
404 return AFI->getTailCallReservedStack();
405 } else {
406 if (AFI->getTailCallReservedStack() != 0)
407 report_fatal_error("cannot generate ABI-changing tail call for Win64");
408 // Var args are stored here in the primary function.
409 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
410 // To support EH funclets we allocate an UnwindHelp object
411 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
412 return alignTo(VarArgsArea + UnwindHelpObject, 16);
413 }
414}
415
416/// Returns the size of the entire SVE stackframe (calleesaves + spills).
419 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
420}
421
423 if (!EnableRedZone)
424 return false;
425
426 // Don't use the red zone if the function explicitly asks us not to.
427 // This is typically used for kernel code.
428 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
429 const unsigned RedZoneSize =
431 if (!RedZoneSize)
432 return false;
433
434 const MachineFrameInfo &MFI = MF.getFrameInfo();
436 uint64_t NumBytes = AFI->getLocalStackSize();
437
438 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
439 getSVEStackSize(MF));
440}
441
442/// hasFP - Return true if the specified function should have a dedicated frame
443/// pointer register.
445 const MachineFrameInfo &MFI = MF.getFrameInfo();
446 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
447
448 // Win64 EH requires a frame pointer if funclets are present, as the locals
449 // are accessed off the frame pointer in both the parent function and the
450 // funclets.
451 if (MF.hasEHFunclets())
452 return true;
453 // Retain behavior of always omitting the FP for leaf functions when possible.
455 return true;
456 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
457 MFI.hasStackMap() || MFI.hasPatchPoint() ||
458 RegInfo->hasStackRealignment(MF))
459 return true;
460 // With large callframes around we may need to use FP to access the scavenging
461 // emergency spillslot.
462 //
463 // Unfortunately some calls to hasFP() like machine verifier ->
464 // getReservedReg() -> hasFP in the middle of global isel are too early
465 // to know the max call frame size. Hopefully conservatively returning "true"
466 // in those cases is fine.
467 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
468 if (!MFI.isMaxCallFrameSizeComputed() ||
470 return true;
471
472 return false;
473}
474
475/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
476/// not required, we reserve argument space for call sites in the function
477/// immediately on entry to the current function. This eliminates the need for
478/// add/sub sp brackets around call sites. Returns true if the call frame is
479/// included as part of the stack frame.
480bool
482 return !MF.getFrameInfo().hasVarSizedObjects();
483}
484
488 const AArch64InstrInfo *TII =
489 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
490 DebugLoc DL = I->getDebugLoc();
491 unsigned Opc = I->getOpcode();
492 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
493 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
494
495 if (!hasReservedCallFrame(MF)) {
496 int64_t Amount = I->getOperand(0).getImm();
497 Amount = alignTo(Amount, getStackAlign());
498 if (!IsDestroy)
499 Amount = -Amount;
500
501 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
502 // doesn't have to pop anything), then the first operand will be zero too so
503 // this adjustment is a no-op.
504 if (CalleePopAmount == 0) {
505 // FIXME: in-function stack adjustment for calls is limited to 24-bits
506 // because there's no guaranteed temporary register available.
507 //
508 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
509 // 1) For offset <= 12-bit, we use LSL #0
510 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
511 // LSL #0, and the other uses LSL #12.
512 //
513 // Most call frames will be allocated at the start of a function so
514 // this is OK, but it is a limitation that needs dealing with.
515 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
516 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
517 StackOffset::getFixed(Amount), TII);
518 }
519 } else if (CalleePopAmount != 0) {
520 // If the calling convention demands that the callee pops arguments from the
521 // stack, we want to add it back if we have a reserved call frame.
522 assert(CalleePopAmount < 0xffffff && "call frame too large");
523 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
524 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
525 }
526 return MBB.erase(I);
527}
528
529void AArch64FrameLowering::emitCalleeSavedGPRLocations(
532 MachineFrameInfo &MFI = MF.getFrameInfo();
533
534 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
535 if (CSI.empty())
536 return;
537
538 const TargetSubtargetInfo &STI = MF.getSubtarget();
539 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
540 const TargetInstrInfo &TII = *STI.getInstrInfo();
542
543 for (const auto &Info : CSI) {
544 if (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector)
545 continue;
546
547 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
548 unsigned DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
549
550 int64_t Offset =
551 MFI.getObjectOffset(Info.getFrameIdx()) - getOffsetOfLocalArea();
552 unsigned CFIIndex = MF.addFrameInst(
553 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
554 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
555 .addCFIIndex(CFIIndex)
557 }
558}
559
560void AArch64FrameLowering::emitCalleeSavedSVELocations(
563 MachineFrameInfo &MFI = MF.getFrameInfo();
564
565 // Add callee saved registers to move list.
566 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
567 if (CSI.empty())
568 return;
569
570 const TargetSubtargetInfo &STI = MF.getSubtarget();
571 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
572 const TargetInstrInfo &TII = *STI.getInstrInfo();
575
576 for (const auto &Info : CSI) {
577 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
578 continue;
579
580 // Not all unwinders may know about SVE registers, so assume the lowest
581 // common demoninator.
582 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
583 unsigned Reg = Info.getReg();
584 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
585 continue;
586
588 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
590
591 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
592 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
593 .addCFIIndex(CFIIndex)
595 }
596}
597
601 unsigned DwarfReg) {
602 unsigned CFIIndex =
603 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
604 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
605}
606
608 MachineBasicBlock &MBB) const {
609
611 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
612 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
613 const auto &TRI =
614 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
615 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
616
617 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
618 DebugLoc DL;
619
620 // Reset the CFA to `SP + 0`.
622 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
623 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
624 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
625
626 // Flip the RA sign state.
627 if (MFI.shouldSignReturnAddress(MF)) {
629 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
630 }
631
632 // Shadow call stack uses X18, reset it.
633 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
634 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
635 TRI.getDwarfRegNum(AArch64::X18, true));
636
637 // Emit .cfi_same_value for callee-saved registers.
638 const std::vector<CalleeSavedInfo> &CSI =
640 for (const auto &Info : CSI) {
641 unsigned Reg = Info.getReg();
642 if (!TRI.regNeedsCFI(Reg, Reg))
643 continue;
644 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
645 TRI.getDwarfRegNum(Reg, true));
646 }
647}
648
651 bool SVE) {
653 MachineFrameInfo &MFI = MF.getFrameInfo();
654
655 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
656 if (CSI.empty())
657 return;
658
659 const TargetSubtargetInfo &STI = MF.getSubtarget();
660 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
661 const TargetInstrInfo &TII = *STI.getInstrInfo();
663
664 for (const auto &Info : CSI) {
665 if (SVE !=
666 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
667 continue;
668
669 unsigned Reg = Info.getReg();
670 if (SVE &&
671 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
672 continue;
673
674 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
675 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
676 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
677 .addCFIIndex(CFIIndex)
679 }
680}
681
682void AArch64FrameLowering::emitCalleeSavedGPRRestores(
685}
686
687void AArch64FrameLowering::emitCalleeSavedSVERestores(
690}
691
692void AArch64FrameLowering::allocateStackSpace(
694 bool NeedsRealignment, StackOffset AllocSize, bool NeedsWinCFI,
695 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset) const {
696
697 if (!AllocSize)
698 return;
699
700 DebugLoc DL;
702 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
703 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
705 const MachineFrameInfo &MFI = MF.getFrameInfo();
706
707 Register TargetReg =
708 NeedsRealignment ? findScratchNonCalleeSaveRegister(&MBB) : AArch64::SP;
709 // SUB Xd/SP, SP, AllocSize
710 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
711 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
712 EmitCFI, InitialOffset);
713
714 if (NeedsRealignment) {
715 const int64_t MaxAlign = MFI.getMaxAlign().value();
716 const uint64_t AndMask = ~(MaxAlign - 1);
717 // AND SP, Xd, 0b11111...0000
718 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
719 .addReg(TargetReg, RegState::Kill)
722 AFI.setStackRealigned(true);
723
724 // No need for SEH instructions here; if we're realigning the stack,
725 // we've set a frame pointer and already finished the SEH prologue.
726 assert(!NeedsWinCFI);
727 }
728}
729
730static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
731 switch (Reg.id()) {
732 default:
733 // The called routine is expected to preserve r19-r28
734 // r29 and r30 are used as frame pointer and link register resp.
735 return 0;
736
737 // GPRs
738#define CASE(n) \
739 case AArch64::W##n: \
740 case AArch64::X##n: \
741 return AArch64::X##n
742 CASE(0);
743 CASE(1);
744 CASE(2);
745 CASE(3);
746 CASE(4);
747 CASE(5);
748 CASE(6);
749 CASE(7);
750 CASE(8);
751 CASE(9);
752 CASE(10);
753 CASE(11);
754 CASE(12);
755 CASE(13);
756 CASE(14);
757 CASE(15);
758 CASE(16);
759 CASE(17);
760 CASE(18);
761#undef CASE
762
763 // FPRs
764#define CASE(n) \
765 case AArch64::B##n: \
766 case AArch64::H##n: \
767 case AArch64::S##n: \
768 case AArch64::D##n: \
769 case AArch64::Q##n: \
770 return HasSVE ? AArch64::Z##n : AArch64::Q##n
771 CASE(0);
772 CASE(1);
773 CASE(2);
774 CASE(3);
775 CASE(4);
776 CASE(5);
777 CASE(6);
778 CASE(7);
779 CASE(8);
780 CASE(9);
781 CASE(10);
782 CASE(11);
783 CASE(12);
784 CASE(13);
785 CASE(14);
786 CASE(15);
787 CASE(16);
788 CASE(17);
789 CASE(18);
790 CASE(19);
791 CASE(20);
792 CASE(21);
793 CASE(22);
794 CASE(23);
795 CASE(24);
796 CASE(25);
797 CASE(26);
798 CASE(27);
799 CASE(28);
800 CASE(29);
801 CASE(30);
802 CASE(31);
803#undef CASE
804 }
805}
806
807void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
808 MachineBasicBlock &MBB) const {
809 // Insertion point.
811
812 // Fake a debug loc.
813 DebugLoc DL;
814 if (MBBI != MBB.end())
815 DL = MBBI->getDebugLoc();
816
817 const MachineFunction &MF = *MBB.getParent();
820
821 BitVector GPRsToZero(TRI.getNumRegs());
822 BitVector FPRsToZero(TRI.getNumRegs());
823 bool HasSVE = STI.hasSVE();
824 for (MCRegister Reg : RegsToZero.set_bits()) {
825 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
826 // For GPRs, we only care to clear out the 64-bit register.
827 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
828 GPRsToZero.set(XReg);
829 } else if (AArch64::FPR128RegClass.contains(Reg) ||
830 AArch64::FPR64RegClass.contains(Reg) ||
831 AArch64::FPR32RegClass.contains(Reg) ||
832 AArch64::FPR16RegClass.contains(Reg) ||
833 AArch64::FPR8RegClass.contains(Reg)) {
834 // For FPRs,
835 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
836 FPRsToZero.set(XReg);
837 }
838 }
839
840 const AArch64InstrInfo &TII = *STI.getInstrInfo();
841
842 // Zero out GPRs.
843 for (MCRegister Reg : GPRsToZero.set_bits())
844 TII.buildClearRegister(Reg, MBB, MBBI, DL);
845
846 // Zero out FP/vector registers.
847 for (MCRegister Reg : FPRsToZero.set_bits())
848 TII.buildClearRegister(Reg, MBB, MBBI, DL);
849
850 if (HasSVE) {
851 for (MCRegister PReg :
852 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
853 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
854 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
855 AArch64::P15}) {
856 if (RegsToZero[PReg])
857 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
858 }
859 }
860}
861
862// Find a scratch register that we can use at the start of the prologue to
863// re-align the stack pointer. We avoid using callee-save registers since they
864// may appear to be free when this is called from canUseAsPrologue (during
865// shrink wrapping), but then no longer be free when this is called from
866// emitPrologue.
867//
868// FIXME: This is a bit conservative, since in the above case we could use one
869// of the callee-save registers as a scratch temp to re-align the stack pointer,
870// but we would then have to make sure that we were in fact saving at least one
871// callee-save register in the prologue, which is additional complexity that
872// doesn't seem worth the benefit.
875
876 // If MBB is an entry block, use X9 as the scratch register
877 if (&MF->front() == MBB)
878 return AArch64::X9;
879
880 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
881 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
882 LivePhysRegs LiveRegs(TRI);
883 LiveRegs.addLiveIns(*MBB);
884
885 // Mark callee saved registers as used so we will not choose them.
886 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
887 for (unsigned i = 0; CSRegs[i]; ++i)
888 LiveRegs.addReg(CSRegs[i]);
889
890 // Prefer X9 since it was historically used for the prologue scratch reg.
891 const MachineRegisterInfo &MRI = MF->getRegInfo();
892 if (LiveRegs.available(MRI, AArch64::X9))
893 return AArch64::X9;
894
895 for (unsigned Reg : AArch64::GPR64RegClass) {
896 if (LiveRegs.available(MRI, Reg))
897 return Reg;
898 }
899 return AArch64::NoRegister;
900}
901
903 const MachineBasicBlock &MBB) const {
904 const MachineFunction *MF = MBB.getParent();
905 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
906 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
907 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
908
909 // Don't need a scratch register if we're not going to re-align the stack.
910 if (!RegInfo->hasStackRealignment(*MF))
911 return true;
912 // Otherwise, we can use any block as long as it has a scratch register
913 // available.
914 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
915}
916
918 uint64_t StackSizeInBytes) {
919 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
920 if (!Subtarget.isTargetWindows())
921 return false;
922 const Function &F = MF.getFunction();
923 // TODO: When implementing stack protectors, take that into account
924 // for the probe threshold.
925 unsigned StackProbeSize =
926 F.getFnAttributeAsParsedInteger("stack-probe-size", 4096);
927 return (StackSizeInBytes >= StackProbeSize) &&
928 !F.hasFnAttribute("no-stack-arg-probe");
929}
930
931static bool needsWinCFI(const MachineFunction &MF) {
932 const Function &F = MF.getFunction();
933 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
934 F.needsUnwindTableEntry();
935}
936
937bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
938 MachineFunction &MF, uint64_t StackBumpBytes) const {
940 const MachineFrameInfo &MFI = MF.getFrameInfo();
941 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
942 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
943 if (homogeneousPrologEpilog(MF))
944 return false;
945
946 if (AFI->getLocalStackSize() == 0)
947 return false;
948
949 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
950 // (to force a stp with predecrement) to match the packed unwind format,
951 // provided that there actually are any callee saved registers to merge the
952 // decrement with.
953 // This is potentially marginally slower, but allows using the packed
954 // unwind format for functions that both have a local area and callee saved
955 // registers. Using the packed unwind format notably reduces the size of
956 // the unwind info.
957 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
958 MF.getFunction().hasOptSize())
959 return false;
960
961 // 512 is the maximum immediate for stp/ldp that will be used for
962 // callee-save save/restores
963 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
964 return false;
965
966 if (MFI.hasVarSizedObjects())
967 return false;
968
969 if (RegInfo->hasStackRealignment(MF))
970 return false;
971
972 // This isn't strictly necessary, but it simplifies things a bit since the
973 // current RedZone handling code assumes the SP is adjusted by the
974 // callee-save save/restore code.
975 if (canUseRedZone(MF))
976 return false;
977
978 // When there is an SVE area on the stack, always allocate the
979 // callee-saves and spills/locals separately.
980 if (getSVEStackSize(MF))
981 return false;
982
983 return true;
984}
985
986bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
987 MachineBasicBlock &MBB, unsigned StackBumpBytes) const {
988 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
989 return false;
990
991 if (MBB.empty())
992 return true;
993
994 // Disable combined SP bump if the last instruction is an MTE tag store. It
995 // is almost always better to merge SP adjustment into those instructions.
998 while (LastI != Begin) {
999 --LastI;
1000 if (LastI->isTransient())
1001 continue;
1002 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1003 break;
1004 }
1005 switch (LastI->getOpcode()) {
1006 case AArch64::STGloop:
1007 case AArch64::STZGloop:
1008 case AArch64::STGi:
1009 case AArch64::STZGi:
1010 case AArch64::ST2Gi:
1011 case AArch64::STZ2Gi:
1012 return false;
1013 default:
1014 return true;
1015 }
1016 llvm_unreachable("unreachable");
1017}
1018
1019// Given a load or a store instruction, generate an appropriate unwinding SEH
1020// code on Windows.
1022 const TargetInstrInfo &TII,
1023 MachineInstr::MIFlag Flag) {
1024 unsigned Opc = MBBI->getOpcode();
1026 MachineFunction &MF = *MBB->getParent();
1027 DebugLoc DL = MBBI->getDebugLoc();
1028 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1029 int Imm = MBBI->getOperand(ImmIdx).getImm();
1031 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1032 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1033
1034 switch (Opc) {
1035 default:
1036 llvm_unreachable("No SEH Opcode for this instruction");
1037 case AArch64::LDPDpost:
1038 Imm = -Imm;
1039 [[fallthrough]];
1040 case AArch64::STPDpre: {
1041 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1042 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1043 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1044 .addImm(Reg0)
1045 .addImm(Reg1)
1046 .addImm(Imm * 8)
1047 .setMIFlag(Flag);
1048 break;
1049 }
1050 case AArch64::LDPXpost:
1051 Imm = -Imm;
1052 [[fallthrough]];
1053 case AArch64::STPXpre: {
1054 Register Reg0 = MBBI->getOperand(1).getReg();
1055 Register Reg1 = MBBI->getOperand(2).getReg();
1056 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1057 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1058 .addImm(Imm * 8)
1059 .setMIFlag(Flag);
1060 else
1061 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1062 .addImm(RegInfo->getSEHRegNum(Reg0))
1063 .addImm(RegInfo->getSEHRegNum(Reg1))
1064 .addImm(Imm * 8)
1065 .setMIFlag(Flag);
1066 break;
1067 }
1068 case AArch64::LDRDpost:
1069 Imm = -Imm;
1070 [[fallthrough]];
1071 case AArch64::STRDpre: {
1072 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1073 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1074 .addImm(Reg)
1075 .addImm(Imm)
1076 .setMIFlag(Flag);
1077 break;
1078 }
1079 case AArch64::LDRXpost:
1080 Imm = -Imm;
1081 [[fallthrough]];
1082 case AArch64::STRXpre: {
1083 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1084 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1085 .addImm(Reg)
1086 .addImm(Imm)
1087 .setMIFlag(Flag);
1088 break;
1089 }
1090 case AArch64::STPDi:
1091 case AArch64::LDPDi: {
1092 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1093 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1094 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1095 .addImm(Reg0)
1096 .addImm(Reg1)
1097 .addImm(Imm * 8)
1098 .setMIFlag(Flag);
1099 break;
1100 }
1101 case AArch64::STPXi:
1102 case AArch64::LDPXi: {
1103 Register Reg0 = MBBI->getOperand(0).getReg();
1104 Register Reg1 = MBBI->getOperand(1).getReg();
1105 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1106 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1107 .addImm(Imm * 8)
1108 .setMIFlag(Flag);
1109 else
1110 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1111 .addImm(RegInfo->getSEHRegNum(Reg0))
1112 .addImm(RegInfo->getSEHRegNum(Reg1))
1113 .addImm(Imm * 8)
1114 .setMIFlag(Flag);
1115 break;
1116 }
1117 case AArch64::STRXui:
1118 case AArch64::LDRXui: {
1119 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1120 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1121 .addImm(Reg)
1122 .addImm(Imm * 8)
1123 .setMIFlag(Flag);
1124 break;
1125 }
1126 case AArch64::STRDui:
1127 case AArch64::LDRDui: {
1128 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1129 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1130 .addImm(Reg)
1131 .addImm(Imm * 8)
1132 .setMIFlag(Flag);
1133 break;
1134 }
1135 }
1136 auto I = MBB->insertAfter(MBBI, MIB);
1137 return I;
1138}
1139
1140// Fix up the SEH opcode associated with the save/restore instruction.
1142 unsigned LocalStackSize) {
1143 MachineOperand *ImmOpnd = nullptr;
1144 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1145 switch (MBBI->getOpcode()) {
1146 default:
1147 llvm_unreachable("Fix the offset in the SEH instruction");
1148 case AArch64::SEH_SaveFPLR:
1149 case AArch64::SEH_SaveRegP:
1150 case AArch64::SEH_SaveReg:
1151 case AArch64::SEH_SaveFRegP:
1152 case AArch64::SEH_SaveFReg:
1153 ImmOpnd = &MBBI->getOperand(ImmIdx);
1154 break;
1155 }
1156 if (ImmOpnd)
1157 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1158}
1159
1160// Convert callee-save register save/restore instruction to do stack pointer
1161// decrement/increment to allocate/deallocate the callee-save stack area by
1162// converting store/load to use pre/post increment version.
1165 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1166 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1168 int CFAOffset = 0) {
1169 unsigned NewOpc;
1170 switch (MBBI->getOpcode()) {
1171 default:
1172 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1173 case AArch64::STPXi:
1174 NewOpc = AArch64::STPXpre;
1175 break;
1176 case AArch64::STPDi:
1177 NewOpc = AArch64::STPDpre;
1178 break;
1179 case AArch64::STPQi:
1180 NewOpc = AArch64::STPQpre;
1181 break;
1182 case AArch64::STRXui:
1183 NewOpc = AArch64::STRXpre;
1184 break;
1185 case AArch64::STRDui:
1186 NewOpc = AArch64::STRDpre;
1187 break;
1188 case AArch64::STRQui:
1189 NewOpc = AArch64::STRQpre;
1190 break;
1191 case AArch64::LDPXi:
1192 NewOpc = AArch64::LDPXpost;
1193 break;
1194 case AArch64::LDPDi:
1195 NewOpc = AArch64::LDPDpost;
1196 break;
1197 case AArch64::LDPQi:
1198 NewOpc = AArch64::LDPQpost;
1199 break;
1200 case AArch64::LDRXui:
1201 NewOpc = AArch64::LDRXpost;
1202 break;
1203 case AArch64::LDRDui:
1204 NewOpc = AArch64::LDRDpost;
1205 break;
1206 case AArch64::LDRQui:
1207 NewOpc = AArch64::LDRQpost;
1208 break;
1209 }
1210 // Get rid of the SEH code associated with the old instruction.
1211 if (NeedsWinCFI) {
1212 auto SEH = std::next(MBBI);
1214 SEH->eraseFromParent();
1215 }
1216
1217 TypeSize Scale = TypeSize::getFixed(1);
1218 unsigned Width;
1219 int64_t MinOffset, MaxOffset;
1220 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1221 NewOpc, Scale, Width, MinOffset, MaxOffset);
1222 (void)Success;
1223 assert(Success && "unknown load/store opcode");
1224
1225 // If the first store isn't right where we want SP then we can't fold the
1226 // update in so create a normal arithmetic instruction instead.
1227 MachineFunction &MF = *MBB.getParent();
1228 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1229 CSStackSizeInc < MinOffset || CSStackSizeInc > MaxOffset) {
1230 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1231 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1232 false, false, nullptr, EmitCFI,
1233 StackOffset::getFixed(CFAOffset));
1234
1235 return std::prev(MBBI);
1236 }
1237
1238 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1239 MIB.addReg(AArch64::SP, RegState::Define);
1240
1241 // Copy all operands other than the immediate offset.
1242 unsigned OpndIdx = 0;
1243 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1244 ++OpndIdx)
1245 MIB.add(MBBI->getOperand(OpndIdx));
1246
1247 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1248 "Unexpected immediate offset in first/last callee-save save/restore "
1249 "instruction!");
1250 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1251 "Unexpected base register in callee-save save/restore instruction!");
1252 assert(CSStackSizeInc % Scale == 0);
1253 MIB.addImm(CSStackSizeInc / (int)Scale);
1254
1255 MIB.setMIFlags(MBBI->getFlags());
1256 MIB.setMemRefs(MBBI->memoperands());
1257
1258 // Generate a new SEH code that corresponds to the new instruction.
1259 if (NeedsWinCFI) {
1260 *HasWinCFI = true;
1261 InsertSEH(*MIB, *TII, FrameFlag);
1262 }
1263
1264 if (EmitCFI) {
1265 unsigned CFIIndex = MF.addFrameInst(
1266 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1267 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1268 .addCFIIndex(CFIIndex)
1269 .setMIFlags(FrameFlag);
1270 }
1271
1272 return std::prev(MBB.erase(MBBI));
1273}
1274
1275// Fixup callee-save register save/restore instructions to take into account
1276// combined SP bump by adding the local stack size to the stack offsets.
1278 uint64_t LocalStackSize,
1279 bool NeedsWinCFI,
1280 bool *HasWinCFI) {
1282 return;
1283
1284 unsigned Opc = MI.getOpcode();
1285 unsigned Scale;
1286 switch (Opc) {
1287 case AArch64::STPXi:
1288 case AArch64::STRXui:
1289 case AArch64::STPDi:
1290 case AArch64::STRDui:
1291 case AArch64::LDPXi:
1292 case AArch64::LDRXui:
1293 case AArch64::LDPDi:
1294 case AArch64::LDRDui:
1295 Scale = 8;
1296 break;
1297 case AArch64::STPQi:
1298 case AArch64::STRQui:
1299 case AArch64::LDPQi:
1300 case AArch64::LDRQui:
1301 Scale = 16;
1302 break;
1303 default:
1304 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1305 }
1306
1307 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1308 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1309 "Unexpected base register in callee-save save/restore instruction!");
1310 // Last operand is immediate offset that needs fixing.
1311 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1312 // All generated opcodes have scaled offsets.
1313 assert(LocalStackSize % Scale == 0);
1314 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1315
1316 if (NeedsWinCFI) {
1317 *HasWinCFI = true;
1318 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1319 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1321 "Expecting a SEH instruction");
1322 fixupSEHOpcode(MBBI, LocalStackSize);
1323 }
1324}
1325
1326static bool isTargetWindows(const MachineFunction &MF) {
1328}
1329
1330// Convenience function to determine whether I is an SVE callee save.
1332 switch (I->getOpcode()) {
1333 default:
1334 return false;
1335 case AArch64::STR_ZXI:
1336 case AArch64::STR_PXI:
1337 case AArch64::LDR_ZXI:
1338 case AArch64::LDR_PXI:
1339 return I->getFlag(MachineInstr::FrameSetup) ||
1340 I->getFlag(MachineInstr::FrameDestroy);
1341 }
1342}
1343
1345 MachineFunction &MF,
1348 const DebugLoc &DL, bool NeedsWinCFI,
1349 bool NeedsUnwindInfo) {
1350 // Shadow call stack prolog: str x30, [x18], #8
1351 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1352 .addReg(AArch64::X18, RegState::Define)
1353 .addReg(AArch64::LR)
1354 .addReg(AArch64::X18)
1355 .addImm(8)
1357
1358 // This instruction also makes x18 live-in to the entry block.
1359 MBB.addLiveIn(AArch64::X18);
1360
1361 if (NeedsWinCFI)
1362 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1364
1365 if (NeedsUnwindInfo) {
1366 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1367 // x18 when unwinding past this frame.
1368 static const char CFIInst[] = {
1369 dwarf::DW_CFA_val_expression,
1370 18, // register
1371 2, // length
1372 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1373 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1374 };
1375 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1376 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1377 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1378 .addCFIIndex(CFIIndex)
1380 }
1381}
1382
1384 MachineFunction &MF,
1387 const DebugLoc &DL) {
1388 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1389 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1390 .addReg(AArch64::X18, RegState::Define)
1391 .addReg(AArch64::LR, RegState::Define)
1392 .addReg(AArch64::X18)
1393 .addImm(-8)
1395
1397 unsigned CFIIndex =
1399 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1400 .addCFIIndex(CFIIndex)
1402 }
1403}
1404
1405// Define the current CFA rule to use the provided FP.
1408 const DebugLoc &DL, unsigned FixedObject) {
1411 const TargetInstrInfo *TII = STI.getInstrInfo();
1413
1414 const int OffsetToFirstCalleeSaveFromFP =
1417 Register FramePtr = TRI->getFrameRegister(MF);
1418 unsigned Reg = TRI->getDwarfRegNum(FramePtr, true);
1419 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1420 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1421 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1422 .addCFIIndex(CFIIndex)
1424}
1425
1426#ifndef NDEBUG
1427/// Collect live registers from the end of \p MI's parent up to (including) \p
1428/// MI in \p LiveRegs.
1430 LivePhysRegs &LiveRegs) {
1431
1432 MachineBasicBlock &MBB = *MI.getParent();
1433 LiveRegs.addLiveOuts(MBB);
1434 for (const MachineInstr &MI :
1435 reverse(make_range(MI.getIterator(), MBB.instr_end())))
1436 LiveRegs.stepBackward(MI);
1437}
1438#endif
1439
1441 MachineBasicBlock &MBB) const {
1443 const MachineFrameInfo &MFI = MF.getFrameInfo();
1444 const Function &F = MF.getFunction();
1445 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1446 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1447 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1448
1449 MachineModuleInfo &MMI = MF.getMMI();
1451 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1452 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1453 bool HasFP = hasFP(MF);
1454 bool NeedsWinCFI = needsWinCFI(MF);
1455 bool HasWinCFI = false;
1456 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1457
1459#ifndef NDEBUG
1461 // Collect live register from the end of MBB up to the start of the existing
1462 // frame setup instructions.
1463 MachineBasicBlock::iterator NonFrameStart = MBB.begin();
1464 while (NonFrameStart != End &&
1465 NonFrameStart->getFlag(MachineInstr::FrameSetup))
1466 ++NonFrameStart;
1467
1468 LivePhysRegs LiveRegs(*TRI);
1469 if (NonFrameStart != MBB.end()) {
1470 getLivePhysRegsUpTo(*NonFrameStart, *TRI, LiveRegs);
1471 // Ignore registers used for stack management for now.
1472 LiveRegs.removeReg(AArch64::SP);
1473 LiveRegs.removeReg(AArch64::X19);
1474 LiveRegs.removeReg(AArch64::FP);
1475 LiveRegs.removeReg(AArch64::LR);
1476 }
1477
1478 auto VerifyClobberOnExit = make_scope_exit([&]() {
1479 if (NonFrameStart == MBB.end())
1480 return;
1481 // Check if any of the newly instructions clobber any of the live registers.
1482 for (MachineInstr &MI :
1483 make_range(MBB.instr_begin(), NonFrameStart->getIterator())) {
1484 for (auto &Op : MI.operands())
1485 if (Op.isReg() && Op.isDef())
1486 assert(!LiveRegs.contains(Op.getReg()) &&
1487 "live register clobbered by inserted prologue instructions");
1488 }
1489 });
1490#endif
1491
1492 bool IsFunclet = MBB.isEHFuncletEntry();
1493
1494 // At this point, we're going to decide whether or not the function uses a
1495 // redzone. In most cases, the function doesn't have a redzone so let's
1496 // assume that's false and set it to true in the case that there's a redzone.
1497 AFI->setHasRedZone(false);
1498
1499 // Debug location must be unknown since the first debug location is used
1500 // to determine the end of the prologue.
1501 DebugLoc DL;
1502
1503 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1504 if (MFnI.needsShadowCallStackPrologueEpilogue(MF))
1505 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1506 MFnI.needsDwarfUnwindInfo(MF));
1507
1508 if (MFnI.shouldSignReturnAddress(MF)) {
1509 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1511 if (NeedsWinCFI)
1512 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1513 }
1514
1515 if (EmitCFI && MFnI.isMTETagged()) {
1516 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1518 }
1519
1520 // We signal the presence of a Swift extended frame to external tools by
1521 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1522 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1523 // bits so that is still true.
1524 if (HasFP && AFI->hasSwiftAsyncContext()) {
1527 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1528 // The special symbol below is absolute and has a *value* that can be
1529 // combined with the frame pointer to signal an extended frame.
1530 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1531 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1533 if (NeedsWinCFI) {
1534 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1536 HasWinCFI = true;
1537 }
1538 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1539 .addUse(AArch64::FP)
1540 .addUse(AArch64::X16)
1541 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1542 if (NeedsWinCFI) {
1543 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1545 HasWinCFI = true;
1546 }
1547 break;
1548 }
1549 [[fallthrough]];
1550
1552 // ORR x29, x29, #0x1000_0000_0000_0000
1553 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1554 .addUse(AArch64::FP)
1555 .addImm(0x1100)
1557 if (NeedsWinCFI) {
1558 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1560 HasWinCFI = true;
1561 }
1562 break;
1563
1565 break;
1566 }
1567 }
1568
1569 // All calls are tail calls in GHC calling conv, and functions have no
1570 // prologue/epilogue.
1572 return;
1573
1574 // Set tagged base pointer to the requested stack slot.
1575 // Ideally it should match SP value after prologue.
1576 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1577 if (TBPI)
1579 else
1581
1582 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1583
1584 // getStackSize() includes all the locals in its size calculation. We don't
1585 // include these locals when computing the stack size of a funclet, as they
1586 // are allocated in the parent's stack frame and accessed via the frame
1587 // pointer from the funclet. We only save the callee saved registers in the
1588 // funclet, which are really the callee saved registers of the parent
1589 // function, including the funclet.
1590 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
1591 : MFI.getStackSize();
1592 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1593 assert(!HasFP && "unexpected function without stack frame but with FP");
1594 assert(!SVEStackSize &&
1595 "unexpected function without stack frame but with SVE objects");
1596 // All of the stack allocation is for locals.
1597 AFI->setLocalStackSize(NumBytes);
1598 if (!NumBytes)
1599 return;
1600 // REDZONE: If the stack size is less than 128 bytes, we don't need
1601 // to actually allocate.
1602 if (canUseRedZone(MF)) {
1603 AFI->setHasRedZone(true);
1604 ++NumRedZoneFunctions;
1605 } else {
1606 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1607 StackOffset::getFixed(-NumBytes), TII,
1608 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1609 if (EmitCFI) {
1610 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1611 MCSymbol *FrameLabel = MMI.getContext().createTempSymbol();
1612 // Encode the stack size of the leaf function.
1613 unsigned CFIIndex = MF.addFrameInst(
1614 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1615 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1616 .addCFIIndex(CFIIndex)
1618 }
1619 }
1620
1621 if (NeedsWinCFI) {
1622 HasWinCFI = true;
1623 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1625 }
1626
1627 return;
1628 }
1629
1630 bool IsWin64 =
1632 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1633
1634 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1635 // All of the remaining stack allocations are for locals.
1636 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1637 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1638 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1639 if (CombineSPBump) {
1640 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1641 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1642 StackOffset::getFixed(-NumBytes), TII,
1643 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1644 EmitAsyncCFI);
1645 NumBytes = 0;
1646 } else if (HomPrologEpilog) {
1647 // Stack has been already adjusted.
1648 NumBytes -= PrologueSaveSize;
1649 } else if (PrologueSaveSize != 0) {
1651 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1652 EmitAsyncCFI);
1653 NumBytes -= PrologueSaveSize;
1654 }
1655 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1656
1657 // Move past the saves of the callee-saved registers, fixing up the offsets
1658 // and pre-inc if we decided to combine the callee-save and local stack
1659 // pointer bump above.
1660 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1662 if (CombineSPBump)
1664 NeedsWinCFI, &HasWinCFI);
1665 ++MBBI;
1666 }
1667
1668 // For funclets the FP belongs to the containing function.
1669 if (!IsFunclet && HasFP) {
1670 // Only set up FP if we actually need to.
1671 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1672
1673 if (CombineSPBump)
1674 FPOffset += AFI->getLocalStackSize();
1675
1676 if (AFI->hasSwiftAsyncContext()) {
1677 // Before we update the live FP we have to ensure there's a valid (or
1678 // null) asynchronous context in its slot just before FP in the frame
1679 // record, so store it now.
1680 const auto &Attrs = MF.getFunction().getAttributes();
1681 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1682 if (HaveInitialContext)
1683 MBB.addLiveIn(AArch64::X22);
1684 Register Reg = HaveInitialContext ? AArch64::X22 : AArch64::XZR;
1685 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1686 .addUse(Reg)
1687 .addUse(AArch64::SP)
1688 .addImm(FPOffset - 8)
1690 if (NeedsWinCFI) {
1691 // WinCFI and arm64e, where StoreSwiftAsyncContext is expanded
1692 // to multiple instructions, should be mutually-exclusive.
1693 assert(Subtarget.getTargetTriple().getArchName() != "arm64e");
1694 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1696 HasWinCFI = true;
1697 }
1698 }
1699
1700 if (HomPrologEpilog) {
1701 auto Prolog = MBBI;
1702 --Prolog;
1703 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
1704 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
1705 } else {
1706 // Issue sub fp, sp, FPOffset or
1707 // mov fp,sp when FPOffset is zero.
1708 // Note: All stores of callee-saved registers are marked as "FrameSetup".
1709 // This code marks the instruction(s) that set the FP also.
1710 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
1711 StackOffset::getFixed(FPOffset), TII,
1712 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1713 if (NeedsWinCFI && HasWinCFI) {
1714 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1716 // After setting up the FP, the rest of the prolog doesn't need to be
1717 // included in the SEH unwind info.
1718 NeedsWinCFI = false;
1719 }
1720 }
1721 if (EmitAsyncCFI)
1722 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
1723 }
1724
1725 // Now emit the moves for whatever callee saved regs we have (including FP,
1726 // LR if those are saved). Frame instructions for SVE register are emitted
1727 // later, after the instruction which actually save SVE regs.
1728 if (EmitAsyncCFI)
1729 emitCalleeSavedGPRLocations(MBB, MBBI);
1730
1731 // Alignment is required for the parent frame, not the funclet
1732 const bool NeedsRealignment =
1733 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
1734 int64_t RealignmentPadding =
1735 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
1736 ? MFI.getMaxAlign().value() - 16
1737 : 0;
1738
1739 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
1740 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
1741 if (NeedsWinCFI) {
1742 HasWinCFI = true;
1743 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
1744 // exceed this amount. We need to move at most 2^24 - 1 into x15.
1745 // This is at most two instructions, MOVZ follwed by MOVK.
1746 // TODO: Fix to use multiple stack alloc unwind codes for stacks
1747 // exceeding 256MB in size.
1748 if (NumBytes >= (1 << 28))
1749 report_fatal_error("Stack size cannot exceed 256MB for stack "
1750 "unwinding purposes");
1751
1752 uint32_t LowNumWords = NumWords & 0xFFFF;
1753 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
1754 .addImm(LowNumWords)
1757 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1759 if ((NumWords & 0xFFFF0000) != 0) {
1760 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
1761 .addReg(AArch64::X15)
1762 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
1765 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1767 }
1768 } else {
1769 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
1770 .addImm(NumWords)
1772 }
1773
1774 const char* ChkStk = Subtarget.getChkStkName();
1775 switch (MF.getTarget().getCodeModel()) {
1776 case CodeModel::Tiny:
1777 case CodeModel::Small:
1778 case CodeModel::Medium:
1779 case CodeModel::Kernel:
1780 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
1781 .addExternalSymbol(ChkStk)
1782 .addReg(AArch64::X15, RegState::Implicit)
1787 if (NeedsWinCFI) {
1788 HasWinCFI = true;
1789 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1791 }
1792 break;
1793 case CodeModel::Large:
1794 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
1795 .addReg(AArch64::X16, RegState::Define)
1796 .addExternalSymbol(ChkStk)
1797 .addExternalSymbol(ChkStk)
1799 if (NeedsWinCFI) {
1800 HasWinCFI = true;
1801 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1803 }
1804
1805 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
1806 .addReg(AArch64::X16, RegState::Kill)
1812 if (NeedsWinCFI) {
1813 HasWinCFI = true;
1814 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1816 }
1817 break;
1818 }
1819
1820 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
1821 .addReg(AArch64::SP, RegState::Kill)
1822 .addReg(AArch64::X15, RegState::Kill)
1825 if (NeedsWinCFI) {
1826 HasWinCFI = true;
1827 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
1828 .addImm(NumBytes)
1830 }
1831 NumBytes = 0;
1832
1833 if (RealignmentPadding > 0) {
1834 if (RealignmentPadding >= 4096) {
1835 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
1836 .addReg(AArch64::X16, RegState::Define)
1837 .addImm(RealignmentPadding)
1839 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
1840 .addReg(AArch64::SP)
1841 .addReg(AArch64::X16, RegState::Kill)
1844 } else {
1845 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
1846 .addReg(AArch64::SP)
1847 .addImm(RealignmentPadding)
1848 .addImm(0)
1850 }
1851
1852 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
1853 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
1854 .addReg(AArch64::X15, RegState::Kill)
1856 AFI->setStackRealigned(true);
1857
1858 // No need for SEH instructions here; if we're realigning the stack,
1859 // we've set a frame pointer and already finished the SEH prologue.
1860 assert(!NeedsWinCFI);
1861 }
1862 }
1863
1864 StackOffset SVECalleeSavesSize = {}, SVELocalsSize = SVEStackSize;
1865 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
1866
1867 // Process the SVE callee-saves to determine what space needs to be
1868 // allocated.
1869 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
1870 // Find callee save instructions in frame.
1871 CalleeSavesBegin = MBBI;
1872 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
1874 ++MBBI;
1875 CalleeSavesEnd = MBBI;
1876
1877 SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
1878 SVELocalsSize = SVEStackSize - SVECalleeSavesSize;
1879 }
1880
1881 // Allocate space for the callee saves (if any).
1882 StackOffset CFAOffset =
1883 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
1884 allocateStackSpace(MBB, CalleeSavesBegin, false, SVECalleeSavesSize, false,
1885 nullptr, EmitAsyncCFI && !HasFP, CFAOffset);
1886 CFAOffset += SVECalleeSavesSize;
1887
1888 if (EmitAsyncCFI)
1889 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
1890
1891 // Allocate space for the rest of the frame including SVE locals. Align the
1892 // stack as necessary.
1893 assert(!(canUseRedZone(MF) && NeedsRealignment) &&
1894 "Cannot use redzone with stack realignment");
1895 if (!canUseRedZone(MF)) {
1896 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
1897 // the correct value here, as NumBytes also includes padding bytes,
1898 // which shouldn't be counted here.
1899 allocateStackSpace(MBB, CalleeSavesEnd, NeedsRealignment,
1900 SVELocalsSize + StackOffset::getFixed(NumBytes),
1901 NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
1902 CFAOffset);
1903 }
1904
1905 // If we need a base pointer, set it up here. It's whatever the value of the
1906 // stack pointer is at this point. Any variable size objects will be allocated
1907 // after this, so we can still use the base pointer to reference locals.
1908 //
1909 // FIXME: Clarify FrameSetup flags here.
1910 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
1911 // needed.
1912 // For funclets the BP belongs to the containing function.
1913 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
1914 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
1915 false);
1916 if (NeedsWinCFI) {
1917 HasWinCFI = true;
1918 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1920 }
1921 }
1922
1923 // The very last FrameSetup instruction indicates the end of prologue. Emit a
1924 // SEH opcode indicating the prologue end.
1925 if (NeedsWinCFI && HasWinCFI) {
1926 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1928 }
1929
1930 // SEH funclets are passed the frame pointer in X1. If the parent
1931 // function uses the base register, then the base register is used
1932 // directly, and is not retrieved from X1.
1933 if (IsFunclet && F.hasPersonalityFn()) {
1934 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
1935 if (isAsynchronousEHPersonality(Per)) {
1936 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
1937 .addReg(AArch64::X1)
1939 MBB.addLiveIn(AArch64::X1);
1940 }
1941 }
1942
1943 if (EmitCFI && !EmitAsyncCFI) {
1944 if (HasFP) {
1945 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
1946 } else {
1947 StackOffset TotalSize =
1948 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
1949 unsigned CFIIndex = MF.addFrameInst(createDefCFA(
1950 *RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP, TotalSize,
1951 /*LastAdjustmentWasScalable=*/false));
1952 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1953 .addCFIIndex(CFIIndex)
1955 }
1956 emitCalleeSavedGPRLocations(MBB, MBBI);
1957 emitCalleeSavedSVELocations(MBB, MBBI);
1958 }
1959}
1960
1962 switch (MI.getOpcode()) {
1963 default:
1964 return false;
1965 case AArch64::CATCHRET:
1966 case AArch64::CLEANUPRET:
1967 return true;
1968 }
1969}
1970
1972 MachineBasicBlock &MBB) const {
1974 MachineFrameInfo &MFI = MF.getFrameInfo();
1976 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1977 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1978 DebugLoc DL;
1979 bool NeedsWinCFI = needsWinCFI(MF);
1980 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1981 bool HasWinCFI = false;
1982 bool IsFunclet = false;
1983
1984 if (MBB.end() != MBBI) {
1985 DL = MBBI->getDebugLoc();
1986 IsFunclet = isFuncletReturnInstr(*MBBI);
1987 }
1988
1989 MachineBasicBlock::iterator EpilogStartI = MBB.end();
1990
1991 auto FinishingTouches = make_scope_exit([&]() {
1992 if (AFI->shouldSignReturnAddress(MF)) {
1993 BuildMI(MBB, MBB.getFirstTerminator(), DL,
1994 TII->get(AArch64::PAUTH_EPILOGUE))
1995 .setMIFlag(MachineInstr::FrameDestroy);
1996 if (NeedsWinCFI)
1997 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1998 }
2001 if (EmitCFI)
2002 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
2003 if (HasWinCFI) {
2005 TII->get(AArch64::SEH_EpilogEnd))
2007 if (!MF.hasWinCFI())
2008 MF.setHasWinCFI(true);
2009 }
2010 if (NeedsWinCFI) {
2011 assert(EpilogStartI != MBB.end());
2012 if (!HasWinCFI)
2013 MBB.erase(EpilogStartI);
2014 }
2015 });
2016
2017 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
2018 : MFI.getStackSize();
2019
2020 // All calls are tail calls in GHC calling conv, and functions have no
2021 // prologue/epilogue.
2023 return;
2024
2025 // How much of the stack used by incoming arguments this function is expected
2026 // to restore in this particular epilogue.
2027 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
2028 bool IsWin64 =
2029 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2030 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2031
2032 int64_t AfterCSRPopSize = ArgumentStackToRestore;
2033 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2034 // We cannot rely on the local stack size set in emitPrologue if the function
2035 // has funclets, as funclets have different local stack size requirements, and
2036 // the current value set in emitPrologue may be that of the containing
2037 // function.
2038 if (MF.hasEHFunclets())
2039 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2040 if (homogeneousPrologEpilog(MF, &MBB)) {
2041 assert(!NeedsWinCFI);
2042 auto LastPopI = MBB.getFirstTerminator();
2043 if (LastPopI != MBB.begin()) {
2044 auto HomogeneousEpilog = std::prev(LastPopI);
2045 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
2046 LastPopI = HomogeneousEpilog;
2047 }
2048
2049 // Adjust local stack
2050 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2052 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2053
2054 // SP has been already adjusted while restoring callee save regs.
2055 // We've bailed-out the case with adjusting SP for arguments.
2056 assert(AfterCSRPopSize == 0);
2057 return;
2058 }
2059 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
2060 // Assume we can't combine the last pop with the sp restore.
2061
2062 bool CombineAfterCSRBump = false;
2063 if (!CombineSPBump && PrologueSaveSize != 0) {
2065 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2067 Pop = std::prev(Pop);
2068 // Converting the last ldp to a post-index ldp is valid only if the last
2069 // ldp's offset is 0.
2070 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2071 // If the offset is 0 and the AfterCSR pop is not actually trying to
2072 // allocate more stack for arguments (in space that an untimely interrupt
2073 // may clobber), convert it to a post-index ldp.
2074 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2076 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2077 MachineInstr::FrameDestroy, PrologueSaveSize);
2078 } else {
2079 // If not, make sure to emit an add after the last ldp.
2080 // We're doing this by transfering the size to be restored from the
2081 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2082 // pops.
2083 AfterCSRPopSize += PrologueSaveSize;
2084 CombineAfterCSRBump = true;
2085 }
2086 }
2087
2088 // Move past the restores of the callee-saved registers.
2089 // If we plan on combining the sp bump of the local stack size and the callee
2090 // save stack size, we might need to adjust the CSR save and restore offsets.
2093 while (LastPopI != Begin) {
2094 --LastPopI;
2095 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2096 IsSVECalleeSave(LastPopI)) {
2097 ++LastPopI;
2098 break;
2099 } else if (CombineSPBump)
2101 NeedsWinCFI, &HasWinCFI);
2102 }
2103
2104 if (NeedsWinCFI) {
2105 // Note that there are cases where we insert SEH opcodes in the
2106 // epilogue when we had no SEH opcodes in the prologue. For
2107 // example, when there is no stack frame but there are stack
2108 // arguments. Insert the SEH_EpilogStart and remove it later if it
2109 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2110 // functions that don't need it.
2111 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2113 EpilogStartI = LastPopI;
2114 --EpilogStartI;
2115 }
2116
2117 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2120 // Avoid the reload as it is GOT relative, and instead fall back to the
2121 // hardcoded value below. This allows a mismatch between the OS and
2122 // application without immediately terminating on the difference.
2123 [[fallthrough]];
2125 // We need to reset FP to its untagged state on return. Bit 60 is
2126 // currently used to show the presence of an extended frame.
2127
2128 // BIC x29, x29, #0x1000_0000_0000_0000
2129 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2130 AArch64::FP)
2131 .addUse(AArch64::FP)
2132 .addImm(0x10fe)
2134 if (NeedsWinCFI) {
2135 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2137 HasWinCFI = true;
2138 }
2139 break;
2140
2142 break;
2143 }
2144 }
2145
2146 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2147
2148 // If there is a single SP update, insert it before the ret and we're done.
2149 if (CombineSPBump) {
2150 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2151
2152 // When we are about to restore the CSRs, the CFA register is SP again.
2153 if (EmitCFI && hasFP(MF)) {
2154 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2155 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2156 unsigned CFIIndex =
2157 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2158 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2159 .addCFIIndex(CFIIndex)
2161 }
2162
2163 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2164 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2165 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2166 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2167 return;
2168 }
2169
2170 NumBytes -= PrologueSaveSize;
2171 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2172
2173 // Process the SVE callee-saves to determine what space needs to be
2174 // deallocated.
2175 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2176 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2177 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2178 RestoreBegin = std::prev(RestoreEnd);
2179 while (RestoreBegin != MBB.begin() &&
2180 IsSVECalleeSave(std::prev(RestoreBegin)))
2181 --RestoreBegin;
2182
2183 assert(IsSVECalleeSave(RestoreBegin) &&
2184 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2185
2186 StackOffset CalleeSavedSizeAsOffset =
2187 StackOffset::getScalable(CalleeSavedSize);
2188 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2189 DeallocateAfter = CalleeSavedSizeAsOffset;
2190 }
2191
2192 // Deallocate the SVE area.
2193 if (SVEStackSize) {
2194 // If we have stack realignment or variable sized objects on the stack,
2195 // restore the stack pointer from the frame pointer prior to SVE CSR
2196 // restoration.
2197 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2198 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2199 // Set SP to start of SVE callee-save area from which they can
2200 // be reloaded. The code below will deallocate the stack space
2201 // space by moving FP -> SP.
2202 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2203 StackOffset::getScalable(-CalleeSavedSize), TII,
2205 }
2206 } else {
2207 if (AFI->getSVECalleeSavedStackSize()) {
2208 // Deallocate the non-SVE locals first before we can deallocate (and
2209 // restore callee saves) from the SVE area.
2211 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2213 false, false, nullptr, EmitCFI && !hasFP(MF),
2214 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2215 NumBytes = 0;
2216 }
2217
2218 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2219 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2220 false, nullptr, EmitCFI && !hasFP(MF),
2221 SVEStackSize +
2222 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2223
2224 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2225 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2226 false, nullptr, EmitCFI && !hasFP(MF),
2227 DeallocateAfter +
2228 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2229 }
2230 if (EmitCFI)
2231 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2232 }
2233
2234 if (!hasFP(MF)) {
2235 bool RedZone = canUseRedZone(MF);
2236 // If this was a redzone leaf function, we don't need to restore the
2237 // stack pointer (but we may need to pop stack args for fastcc).
2238 if (RedZone && AfterCSRPopSize == 0)
2239 return;
2240
2241 // Pop the local variables off the stack. If there are no callee-saved
2242 // registers, it means we are actually positioned at the terminator and can
2243 // combine stack increment for the locals and the stack increment for
2244 // callee-popped arguments into (possibly) a single instruction and be done.
2245 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2246 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2247 if (NoCalleeSaveRestore)
2248 StackRestoreBytes += AfterCSRPopSize;
2249
2251 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2252 StackOffset::getFixed(StackRestoreBytes), TII,
2253 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2254 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2255
2256 // If we were able to combine the local stack pop with the argument pop,
2257 // then we're done.
2258 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2259 return;
2260 }
2261
2262 NumBytes = 0;
2263 }
2264
2265 // Restore the original stack pointer.
2266 // FIXME: Rather than doing the math here, we should instead just use
2267 // non-post-indexed loads for the restores if we aren't actually going to
2268 // be able to save any instructions.
2269 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2271 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2273 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2274 } else if (NumBytes)
2275 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2276 StackOffset::getFixed(NumBytes), TII,
2277 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2278
2279 // When we are about to restore the CSRs, the CFA register is SP again.
2280 if (EmitCFI && hasFP(MF)) {
2281 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2282 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2283 unsigned CFIIndex = MF.addFrameInst(
2284 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2285 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2286 .addCFIIndex(CFIIndex)
2288 }
2289
2290 // This must be placed after the callee-save restore code because that code
2291 // assumes the SP is at the same location as it was after the callee-save save
2292 // code in the prologue.
2293 if (AfterCSRPopSize) {
2294 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2295 "interrupt may have clobbered");
2296
2298 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2300 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2301 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2302 }
2303}
2304
2307 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2308}
2309
2310/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2311/// debug info. It's the same as what we use for resolving the code-gen
2312/// references for now. FIXME: This can go wrong when references are
2313/// SP-relative and simple call frames aren't used.
2316 Register &FrameReg) const {
2318 MF, FI, FrameReg,
2319 /*PreferFP=*/
2320 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress),
2321 /*ForSimm=*/false);
2322}
2323
2326 int FI) const {
2328}
2329
2331 int64_t ObjectOffset) {
2332 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2333 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2334 bool IsWin64 =
2335 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2336 unsigned FixedObject =
2337 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2338 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2339 int64_t FPAdjust =
2340 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2341 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2342}
2343
2345 int64_t ObjectOffset) {
2346 const auto &MFI = MF.getFrameInfo();
2347 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2348}
2349
2350 // TODO: This function currently does not work for scalable vectors.
2352 int FI) const {
2353 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2355 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2356 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2357 ? getFPOffset(MF, ObjectOffset).getFixed()
2358 : getStackOffset(MF, ObjectOffset).getFixed();
2359}
2360
2362 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2363 bool ForSimm) const {
2364 const auto &MFI = MF.getFrameInfo();
2365 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2366 bool isFixed = MFI.isFixedObjectIndex(FI);
2367 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2368 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2369 PreferFP, ForSimm);
2370}
2371
2373 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2374 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2375 const auto &MFI = MF.getFrameInfo();
2376 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2378 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2379 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2380
2381 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2382 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2383 bool isCSR =
2384 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2385
2386 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2387
2388 // Use frame pointer to reference fixed objects. Use it for locals if
2389 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2390 // reliable as a base). Make sure useFPForScavengingIndex() does the
2391 // right thing for the emergency spill slot.
2392 bool UseFP = false;
2393 if (AFI->hasStackFrame() && !isSVE) {
2394 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2395 // there are scalable (SVE) objects in between the FP and the fixed-sized
2396 // objects.
2397 PreferFP &= !SVEStackSize;
2398
2399 // Note: Keeping the following as multiple 'if' statements rather than
2400 // merging to a single expression for readability.
2401 //
2402 // Argument access should always use the FP.
2403 if (isFixed) {
2404 UseFP = hasFP(MF);
2405 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2406 // References to the CSR area must use FP if we're re-aligning the stack
2407 // since the dynamically-sized alignment padding is between the SP/BP and
2408 // the CSR area.
2409 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2410 UseFP = true;
2411 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2412 // If the FPOffset is negative and we're producing a signed immediate, we
2413 // have to keep in mind that the available offset range for negative
2414 // offsets is smaller than for positive ones. If an offset is available
2415 // via the FP and the SP, use whichever is closest.
2416 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2417 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2418
2419 if (MFI.hasVarSizedObjects()) {
2420 // If we have variable sized objects, we can use either FP or BP, as the
2421 // SP offset is unknown. We can use the base pointer if we have one and
2422 // FP is not preferred. If not, we're stuck with using FP.
2423 bool CanUseBP = RegInfo->hasBasePointer(MF);
2424 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2425 UseFP = PreferFP;
2426 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2427 UseFP = true;
2428 // else we can use BP and FP, but the offset from FP won't fit.
2429 // That will make us scavenge registers which we can probably avoid by
2430 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2431 } else if (FPOffset >= 0) {
2432 // Use SP or FP, whichever gives us the best chance of the offset
2433 // being in range for direct access. If the FPOffset is positive,
2434 // that'll always be best, as the SP will be even further away.
2435 UseFP = true;
2436 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2437 // Funclets access the locals contained in the parent's stack frame
2438 // via the frame pointer, so we have to use the FP in the parent
2439 // function.
2440 (void) Subtarget;
2441 assert(
2442 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv()) &&
2443 "Funclets should only be present on Win64");
2444 UseFP = true;
2445 } else {
2446 // We have the choice between FP and (SP or BP).
2447 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2448 UseFP = true;
2449 }
2450 }
2451 }
2452
2453 assert(
2454 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2455 "In the presence of dynamic stack pointer realignment, "
2456 "non-argument/CSR objects cannot be accessed through the frame pointer");
2457
2458 if (isSVE) {
2459 StackOffset FPOffset =
2461 StackOffset SPOffset =
2462 SVEStackSize +
2463 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2464 ObjectOffset);
2465 // Always use the FP for SVE spills if available and beneficial.
2466 if (hasFP(MF) && (SPOffset.getFixed() ||
2467 FPOffset.getScalable() < SPOffset.getScalable() ||
2468 RegInfo->hasStackRealignment(MF))) {
2469 FrameReg = RegInfo->getFrameRegister(MF);
2470 return FPOffset;
2471 }
2472
2473 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2474 : (unsigned)AArch64::SP;
2475 return SPOffset;
2476 }
2477
2478 StackOffset ScalableOffset = {};
2479 if (UseFP && !(isFixed || isCSR))
2480 ScalableOffset = -SVEStackSize;
2481 if (!UseFP && (isFixed || isCSR))
2482 ScalableOffset = SVEStackSize;
2483
2484 if (UseFP) {
2485 FrameReg = RegInfo->getFrameRegister(MF);
2486 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2487 }
2488
2489 // Use the base pointer if we have one.
2490 if (RegInfo->hasBasePointer(MF))
2491 FrameReg = RegInfo->getBaseRegister();
2492 else {
2493 assert(!MFI.hasVarSizedObjects() &&
2494 "Can't use SP when we have var sized objects.");
2495 FrameReg = AArch64::SP;
2496 // If we're using the red zone for this function, the SP won't actually
2497 // be adjusted, so the offsets will be negative. They're also all
2498 // within range of the signed 9-bit immediate instructions.
2499 if (canUseRedZone(MF))
2500 Offset -= AFI->getLocalStackSize();
2501 }
2502
2503 return StackOffset::getFixed(Offset) + ScalableOffset;
2504}
2505
2506static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2507 // Do not set a kill flag on values that are also marked as live-in. This
2508 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2509 // callee saved registers.
2510 // Omitting the kill flags is conservatively correct even if the live-in
2511 // is not used after all.
2512 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2513 return getKillRegState(!IsLiveIn);
2514}
2515
2517 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2519 return Subtarget.isTargetMachO() &&
2520 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2521 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2523}
2524
2525static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2526 bool NeedsWinCFI, bool IsFirst,
2527 const TargetRegisterInfo *TRI) {
2528 // If we are generating register pairs for a Windows function that requires
2529 // EH support, then pair consecutive registers only. There are no unwind
2530 // opcodes for saves/restores of non-consectuve register pairs.
2531 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2532 // save_lrpair.
2533 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2534
2535 if (Reg2 == AArch64::FP)
2536 return true;
2537 if (!NeedsWinCFI)
2538 return false;
2539 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2540 return false;
2541 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2542 // opcode. If this is the first register pair, it would end up with a
2543 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2544 // if LR is paired with something else than the first register.
2545 // The save_lrpair opcode requires the first register to be an odd one.
2546 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2547 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2548 return false;
2549 return true;
2550}
2551
2552/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2553/// WindowsCFI requires that only consecutive registers can be paired.
2554/// LR and FP need to be allocated together when the frame needs to save
2555/// the frame-record. This means any other register pairing with LR is invalid.
2556static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2557 bool UsesWinAAPCS, bool NeedsWinCFI,
2558 bool NeedsFrameRecord, bool IsFirst,
2559 const TargetRegisterInfo *TRI) {
2560 if (UsesWinAAPCS)
2561 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2562 TRI);
2563
2564 // If we need to store the frame record, don't pair any register
2565 // with LR other than FP.
2566 if (NeedsFrameRecord)
2567 return Reg2 == AArch64::LR;
2568
2569 return false;
2570}
2571
2572namespace {
2573
2574struct RegPairInfo {
2575 unsigned Reg1 = AArch64::NoRegister;
2576 unsigned Reg2 = AArch64::NoRegister;
2577 int FrameIdx;
2578 int Offset;
2579 enum RegType { GPR, FPR64, FPR128, PPR, ZPR } Type;
2580
2581 RegPairInfo() = default;
2582
2583 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2584
2585 unsigned getScale() const {
2586 switch (Type) {
2587 case PPR:
2588 return 2;
2589 case GPR:
2590 case FPR64:
2591 return 8;
2592 case ZPR:
2593 case FPR128:
2594 return 16;
2595 }
2596 llvm_unreachable("Unsupported type");
2597 }
2598
2599 bool isScalable() const { return Type == PPR || Type == ZPR; }
2600};
2601
2602} // end anonymous namespace
2603
2607 bool NeedsFrameRecord) {
2608
2609 if (CSI.empty())
2610 return;
2611
2612 bool IsWindows = isTargetWindows(MF);
2613 bool NeedsWinCFI = needsWinCFI(MF);
2615 MachineFrameInfo &MFI = MF.getFrameInfo();
2617 unsigned Count = CSI.size();
2618 (void)CC;
2619 // MachO's compact unwind format relies on all registers being stored in
2620 // pairs.
2623 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2624 "Odd number of callee-saved regs to spill!");
2625 int ByteOffset = AFI->getCalleeSavedStackSize();
2626 int StackFillDir = -1;
2627 int RegInc = 1;
2628 unsigned FirstReg = 0;
2629 if (NeedsWinCFI) {
2630 // For WinCFI, fill the stack from the bottom up.
2631 ByteOffset = 0;
2632 StackFillDir = 1;
2633 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2634 // backwards, to pair up registers starting from lower numbered registers.
2635 RegInc = -1;
2636 FirstReg = Count - 1;
2637 }
2638 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
2639 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2640
2641 // When iterating backwards, the loop condition relies on unsigned wraparound.
2642 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2643 RegPairInfo RPI;
2644 RPI.Reg1 = CSI[i].getReg();
2645
2646 if (AArch64::GPR64RegClass.contains(RPI.Reg1))
2647 RPI.Type = RegPairInfo::GPR;
2648 else if (AArch64::FPR64RegClass.contains(RPI.Reg1))
2649 RPI.Type = RegPairInfo::FPR64;
2650 else if (AArch64::FPR128RegClass.contains(RPI.Reg1))
2651 RPI.Type = RegPairInfo::FPR128;
2652 else if (AArch64::ZPRRegClass.contains(RPI.Reg1))
2653 RPI.Type = RegPairInfo::ZPR;
2654 else if (AArch64::PPRRegClass.contains(RPI.Reg1))
2655 RPI.Type = RegPairInfo::PPR;
2656 else
2657 llvm_unreachable("Unsupported register class.");
2658
2659 // Add the next reg to the pair if it is in the same register class.
2660 if (unsigned(i + RegInc) < Count) {
2661 Register NextReg = CSI[i + RegInc].getReg();
2662 bool IsFirst = i == FirstReg;
2663 switch (RPI.Type) {
2664 case RegPairInfo::GPR:
2665 if (AArch64::GPR64RegClass.contains(NextReg) &&
2666 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
2667 NeedsWinCFI, NeedsFrameRecord, IsFirst,
2668 TRI))
2669 RPI.Reg2 = NextReg;
2670 break;
2671 case RegPairInfo::FPR64:
2672 if (AArch64::FPR64RegClass.contains(NextReg) &&
2673 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
2674 IsFirst, TRI))
2675 RPI.Reg2 = NextReg;
2676 break;
2677 case RegPairInfo::FPR128:
2678 if (AArch64::FPR128RegClass.contains(NextReg))
2679 RPI.Reg2 = NextReg;
2680 break;
2681 case RegPairInfo::PPR:
2682 case RegPairInfo::ZPR:
2683 break;
2684 }
2685 }
2686
2687 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
2688 // list to come in sorted by frame index so that we can issue the store
2689 // pair instructions directly. Assert if we see anything otherwise.
2690 //
2691 // The order of the registers in the list is controlled by
2692 // getCalleeSavedRegs(), so they will always be in-order, as well.
2693 assert((!RPI.isPaired() ||
2694 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
2695 "Out of order callee saved regs!");
2696
2697 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
2698 RPI.Reg1 == AArch64::LR) &&
2699 "FrameRecord must be allocated together with LR");
2700
2701 // Windows AAPCS has FP and LR reversed.
2702 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
2703 RPI.Reg2 == AArch64::LR) &&
2704 "FrameRecord must be allocated together with LR");
2705
2706 // MachO's compact unwind format relies on all registers being stored in
2707 // adjacent register pairs.
2711 (RPI.isPaired() &&
2712 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
2713 RPI.Reg1 + 1 == RPI.Reg2))) &&
2714 "Callee-save registers not saved as adjacent register pair!");
2715
2716 RPI.FrameIdx = CSI[i].getFrameIdx();
2717 if (NeedsWinCFI &&
2718 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
2719 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
2720
2721 int Scale = RPI.getScale();
2722
2723 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2724 assert(OffsetPre % Scale == 0);
2725
2726 if (RPI.isScalable())
2727 ScalableByteOffset += StackFillDir * Scale;
2728 else
2729 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
2730
2731 // Swift's async context is directly before FP, so allocate an extra
2732 // 8 bytes for it.
2733 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2734 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
2735 (IsWindows && RPI.Reg2 == AArch64::LR)))
2736 ByteOffset += StackFillDir * 8;
2737
2738 assert(!(RPI.isScalable() && RPI.isPaired()) &&
2739 "Paired spill/fill instructions don't exist for SVE vectors");
2740
2741 // Round up size of non-pair to pair size if we need to pad the
2742 // callee-save area to ensure 16-byte alignment.
2743 if (NeedGapToAlignStack && !NeedsWinCFI &&
2744 !RPI.isScalable() && RPI.Type != RegPairInfo::FPR128 &&
2745 !RPI.isPaired() && ByteOffset % 16 != 0) {
2746 ByteOffset += 8 * StackFillDir;
2747 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
2748 // A stack frame with a gap looks like this, bottom up:
2749 // d9, d8. x21, gap, x20, x19.
2750 // Set extra alignment on the x21 object to create the gap above it.
2751 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
2752 NeedGapToAlignStack = false;
2753 }
2754
2755 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2756 assert(OffsetPost % Scale == 0);
2757 // If filling top down (default), we want the offset after incrementing it.
2758 // If filling bottom up (WinCFI) we need the original offset.
2759 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
2760
2761 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
2762 // Swift context can directly precede FP.
2763 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2764 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
2765 (IsWindows && RPI.Reg2 == AArch64::LR)))
2766 Offset += 8;
2767 RPI.Offset = Offset / Scale;
2768
2769 assert(((!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
2770 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
2771 "Offset out of bounds for LDP/STP immediate");
2772
2773 // Save the offset to frame record so that the FP register can point to the
2774 // innermost frame record (spilled FP and LR registers).
2775 if (NeedsFrameRecord && ((!IsWindows && RPI.Reg1 == AArch64::LR &&
2776 RPI.Reg2 == AArch64::FP) ||
2777 (IsWindows && RPI.Reg1 == AArch64::FP &&
2778 RPI.Reg2 == AArch64::LR)))
2780
2781 RegPairs.push_back(RPI);
2782 if (RPI.isPaired())
2783 i += RegInc;
2784 }
2785 if (NeedsWinCFI) {
2786 // If we need an alignment gap in the stack, align the topmost stack
2787 // object. A stack frame with a gap looks like this, bottom up:
2788 // x19, d8. d9, gap.
2789 // Set extra alignment on the topmost stack object (the first element in
2790 // CSI, which goes top down), to create the gap above it.
2791 if (AFI->hasCalleeSaveStackFreeSpace())
2792 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
2793 // We iterated bottom up over the registers; flip RegPairs back to top
2794 // down order.
2795 std::reverse(RegPairs.begin(), RegPairs.end());
2796 }
2797}
2798
2802 MachineFunction &MF = *MBB.getParent();
2804 bool NeedsWinCFI = needsWinCFI(MF);
2805 DebugLoc DL;
2807
2808 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
2809
2810 const MachineRegisterInfo &MRI = MF.getRegInfo();
2811 if (homogeneousPrologEpilog(MF)) {
2812 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
2814
2815 for (auto &RPI : RegPairs) {
2816 MIB.addReg(RPI.Reg1);
2817 MIB.addReg(RPI.Reg2);
2818
2819 // Update register live in.
2820 if (!MRI.isReserved(RPI.Reg1))
2821 MBB.addLiveIn(RPI.Reg1);
2822 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
2823 MBB.addLiveIn(RPI.Reg2);
2824 }
2825 return true;
2826 }
2827 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
2828 unsigned Reg1 = RPI.Reg1;
2829 unsigned Reg2 = RPI.Reg2;
2830 unsigned StrOpc;
2831
2832 // Issue sequence of spills for cs regs. The first spill may be converted
2833 // to a pre-decrement store later by emitPrologue if the callee-save stack
2834 // area allocation can't be combined with the local stack area allocation.
2835 // For example:
2836 // stp x22, x21, [sp, #0] // addImm(+0)
2837 // stp x20, x19, [sp, #16] // addImm(+2)
2838 // stp fp, lr, [sp, #32] // addImm(+4)
2839 // Rationale: This sequence saves uop updates compared to a sequence of
2840 // pre-increment spills like stp xi,xj,[sp,#-16]!
2841 // Note: Similar rationale and sequence for restores in epilog.
2842 unsigned Size;
2843 Align Alignment;
2844 switch (RPI.Type) {
2845 case RegPairInfo::GPR:
2846 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2847 Size = 8;
2848 Alignment = Align(8);
2849 break;
2850 case RegPairInfo::FPR64:
2851 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2852 Size = 8;
2853 Alignment = Align(8);
2854 break;
2855 case RegPairInfo::FPR128:
2856 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2857 Size = 16;
2858 Alignment = Align(16);
2859 break;
2860 case RegPairInfo::ZPR:
2861 StrOpc = AArch64::STR_ZXI;
2862 Size = 16;
2863 Alignment = Align(16);
2864 break;
2865 case RegPairInfo::PPR:
2866 StrOpc = AArch64::STR_PXI;
2867 Size = 2;
2868 Alignment = Align(2);
2869 break;
2870 }
2871 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2872 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2873 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2874 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2875 dbgs() << ")\n");
2876
2877 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2878 "Windows unwdinding requires a consecutive (FP,LR) pair");
2879 // Windows unwind codes require consecutive registers if registers are
2880 // paired. Make the switch here, so that the code below will save (x,x+1)
2881 // and not (x+1,x).
2882 unsigned FrameIdxReg1 = RPI.FrameIdx;
2883 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2884 if (NeedsWinCFI && RPI.isPaired()) {
2885 std::swap(Reg1, Reg2);
2886 std::swap(FrameIdxReg1, FrameIdxReg2);
2887 }
2888 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2889 if (!MRI.isReserved(Reg1))
2890 MBB.addLiveIn(Reg1);
2891 if (RPI.isPaired()) {
2892 if (!MRI.isReserved(Reg2))
2893 MBB.addLiveIn(Reg2);
2894 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2896 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2897 MachineMemOperand::MOStore, Size, Alignment));
2898 }
2899 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2900 .addReg(AArch64::SP)
2901 .addImm(RPI.Offset) // [sp, #offset*scale],
2902 // where factor*scale is implicit
2905 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2906 MachineMemOperand::MOStore, Size, Alignment));
2907 if (NeedsWinCFI)
2909
2910 // Update the StackIDs of the SVE stack slots.
2911 MachineFrameInfo &MFI = MF.getFrameInfo();
2912 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR)
2913 MFI.setStackID(RPI.FrameIdx, TargetStackID::ScalableVector);
2914
2915 }
2916 return true;
2917}
2918
2922 MachineFunction &MF = *MBB.getParent();
2924 DebugLoc DL;
2926 bool NeedsWinCFI = needsWinCFI(MF);
2927
2928 if (MBBI != MBB.end())
2929 DL = MBBI->getDebugLoc();
2930
2931 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
2932
2933 auto EmitMI = [&](const RegPairInfo &RPI) -> MachineBasicBlock::iterator {
2934 unsigned Reg1 = RPI.Reg1;
2935 unsigned Reg2 = RPI.Reg2;
2936
2937 // Issue sequence of restores for cs regs. The last restore may be converted
2938 // to a post-increment load later by emitEpilogue if the callee-save stack
2939 // area allocation can't be combined with the local stack area allocation.
2940 // For example:
2941 // ldp fp, lr, [sp, #32] // addImm(+4)
2942 // ldp x20, x19, [sp, #16] // addImm(+2)
2943 // ldp x22, x21, [sp, #0] // addImm(+0)
2944 // Note: see comment in spillCalleeSavedRegisters()
2945 unsigned LdrOpc;
2946 unsigned Size;
2947 Align Alignment;
2948 switch (RPI.Type) {
2949 case RegPairInfo::GPR:
2950 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2951 Size = 8;
2952 Alignment = Align(8);
2953 break;
2954 case RegPairInfo::FPR64:
2955 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2956 Size = 8;
2957 Alignment = Align(8);
2958 break;
2959 case RegPairInfo::FPR128:
2960 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2961 Size = 16;
2962 Alignment = Align(16);
2963 break;
2964 case RegPairInfo::ZPR:
2965 LdrOpc = AArch64::LDR_ZXI;
2966 Size = 16;
2967 Alignment = Align(16);
2968 break;
2969 case RegPairInfo::PPR:
2970 LdrOpc = AArch64::LDR_PXI;
2971 Size = 2;
2972 Alignment = Align(2);
2973 break;
2974 }
2975 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2976 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2977 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2978 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2979 dbgs() << ")\n");
2980
2981 // Windows unwind codes require consecutive registers if registers are
2982 // paired. Make the switch here, so that the code below will save (x,x+1)
2983 // and not (x+1,x).
2984 unsigned FrameIdxReg1 = RPI.FrameIdx;
2985 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2986 if (NeedsWinCFI && RPI.isPaired()) {
2987 std::swap(Reg1, Reg2);
2988 std::swap(FrameIdxReg1, FrameIdxReg2);
2989 }
2990 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2991 if (RPI.isPaired()) {
2992 MIB.addReg(Reg2, getDefRegState(true));
2994 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2995 MachineMemOperand::MOLoad, Size, Alignment));
2996 }
2997 MIB.addReg(Reg1, getDefRegState(true))
2998 .addReg(AArch64::SP)
2999 .addImm(RPI.Offset) // [sp, #offset*scale]
3000 // where factor*scale is implicit
3003 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3004 MachineMemOperand::MOLoad, Size, Alignment));
3005 if (NeedsWinCFI)
3007
3008 return MIB->getIterator();
3009 };
3010
3011 // SVE objects are always restored in reverse order.
3012 for (const RegPairInfo &RPI : reverse(RegPairs))
3013 if (RPI.isScalable())
3014 EmitMI(RPI);
3015
3016 if (homogeneousPrologEpilog(MF, &MBB)) {
3017 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
3019 for (auto &RPI : RegPairs) {
3020 MIB.addReg(RPI.Reg1, RegState::Define);
3021 MIB.addReg(RPI.Reg2, RegState::Define);
3022 }
3023 return true;
3024 }
3025
3028 for (const RegPairInfo &RPI : reverse(RegPairs)) {
3029 if (RPI.isScalable())
3030 continue;
3031 MachineBasicBlock::iterator It = EmitMI(RPI);
3032 if (First == MBB.end())
3033 First = It;
3034 }
3035 if (First != MBB.end())
3036 MBB.splice(MBBI, &MBB, First);
3037 } else {
3038 for (const RegPairInfo &RPI : RegPairs) {
3039 if (RPI.isScalable())
3040 continue;
3041 (void)EmitMI(RPI);
3042 }
3043 }
3044
3045 return true;
3046}
3047
3049 BitVector &SavedRegs,
3050 RegScavenger *RS) const {
3051 // All calls are tail calls in GHC calling conv, and functions have no
3052 // prologue/epilogue.
3054 return;
3055
3057 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3059 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3061 unsigned UnspilledCSGPR = AArch64::NoRegister;
3062 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3063
3064 MachineFrameInfo &MFI = MF.getFrameInfo();
3065 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3066
3067 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3068 ? RegInfo->getBaseRegister()
3069 : (unsigned)AArch64::NoRegister;
3070
3071 unsigned ExtraCSSpill = 0;
3072 bool HasUnpairedGPR64 = false;
3073 // Figure out which callee-saved registers to save/restore.
3074 for (unsigned i = 0; CSRegs[i]; ++i) {
3075 const unsigned Reg = CSRegs[i];
3076
3077 // Add the base pointer register to SavedRegs if it is callee-save.
3078 if (Reg == BasePointerReg)
3079 SavedRegs.set(Reg);
3080
3081 bool RegUsed = SavedRegs.test(Reg);
3082 unsigned PairedReg = AArch64::NoRegister;
3083 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3084 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3085 AArch64::FPR128RegClass.contains(Reg)) {
3086 // Compensate for odd numbers of GP CSRs.
3087 // For now, all the known cases of odd number of CSRs are of GPRs.
3088 if (HasUnpairedGPR64)
3089 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3090 else
3091 PairedReg = CSRegs[i ^ 1];
3092 }
3093
3094 // If the function requires all the GP registers to save (SavedRegs),
3095 // and there are an odd number of GP CSRs at the same time (CSRegs),
3096 // PairedReg could be in a different register class from Reg, which would
3097 // lead to a FPR (usually D8) accidentally being marked saved.
3098 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3099 PairedReg = AArch64::NoRegister;
3100 HasUnpairedGPR64 = true;
3101 }
3102 assert(PairedReg == AArch64::NoRegister ||
3103 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3104 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3105 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3106
3107 if (!RegUsed) {
3108 if (AArch64::GPR64RegClass.contains(Reg) &&
3109 !RegInfo->isReservedReg(MF, Reg)) {
3110 UnspilledCSGPR = Reg;
3111 UnspilledCSGPRPaired = PairedReg;
3112 }
3113 continue;
3114 }
3115
3116 // MachO's compact unwind format relies on all registers being stored in
3117 // pairs.
3118 // FIXME: the usual format is actually better if unwinding isn't needed.
3119 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3120 !SavedRegs.test(PairedReg)) {
3121 SavedRegs.set(PairedReg);
3122 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3123 !RegInfo->isReservedReg(MF, PairedReg))
3124 ExtraCSSpill = PairedReg;
3125 }
3126 }
3127
3129 !Subtarget.isTargetWindows()) {
3130 // For Windows calling convention on a non-windows OS, where X18 is treated
3131 // as reserved, back up X18 when entering non-windows code (marked with the
3132 // Windows calling convention) and restore when returning regardless of
3133 // whether the individual function uses it - it might call other functions
3134 // that clobber it.
3135 SavedRegs.set(AArch64::X18);
3136 }
3137
3138 // Calculates the callee saved stack size.
3139 unsigned CSStackSize = 0;
3140 unsigned SVECSStackSize = 0;
3142 const MachineRegisterInfo &MRI = MF.getRegInfo();
3143 for (unsigned Reg : SavedRegs.set_bits()) {
3144 auto RegSize = TRI->getRegSizeInBits(Reg, MRI) / 8;
3145 if (AArch64::PPRRegClass.contains(Reg) ||
3146 AArch64::ZPRRegClass.contains(Reg))
3147 SVECSStackSize += RegSize;
3148 else
3149 CSStackSize += RegSize;
3150 }
3151
3152 // Save number of saved regs, so we can easily update CSStackSize later.
3153 unsigned NumSavedRegs = SavedRegs.count();
3154
3155 // The frame record needs to be created by saving the appropriate registers
3156 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3157 if (hasFP(MF) ||
3158 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3159 SavedRegs.set(AArch64::FP);
3160 SavedRegs.set(AArch64::LR);
3161 }
3162
3163 LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3164 for (unsigned Reg
3165 : SavedRegs.set_bits()) dbgs()
3166 << ' ' << printReg(Reg, RegInfo);
3167 dbgs() << "\n";);
3168
3169 // If any callee-saved registers are used, the frame cannot be eliminated.
3170 int64_t SVEStackSize =
3171 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3172 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3173
3174 // The CSR spill slots have not been allocated yet, so estimateStackSize
3175 // won't include them.
3176 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3177
3178 // We may address some of the stack above the canonical frame address, either
3179 // for our own arguments or during a call. Include that in calculating whether
3180 // we have complicated addressing concerns.
3181 int64_t CalleeStackUsed = 0;
3182 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3183 int64_t FixedOff = MFI.getObjectOffset(I);
3184 if (FixedOff > CalleeStackUsed) CalleeStackUsed = FixedOff;
3185 }
3186
3187 // Conservatively always assume BigStack when there are SVE spills.
3188 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3189 CalleeStackUsed) > EstimatedStackSizeLimit;
3190 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3191 AFI->setHasStackFrame(true);
3192
3193 // Estimate if we might need to scavenge a register at some point in order
3194 // to materialize a stack offset. If so, either spill one additional
3195 // callee-saved register or reserve a special spill slot to facilitate
3196 // register scavenging. If we already spilled an extra callee-saved register
3197 // above to keep the number of spills even, we don't need to do anything else
3198 // here.
3199 if (BigStack) {
3200 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3201 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3202 << " to get a scratch register.\n");
3203 SavedRegs.set(UnspilledCSGPR);
3204 ExtraCSSpill = UnspilledCSGPR;
3205
3206 // MachO's compact unwind format relies on all registers being stored in
3207 // pairs, so if we need to spill one extra for BigStack, then we need to
3208 // store the pair.
3209 if (producePairRegisters(MF)) {
3210 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
3211 // Failed to make a pair for compact unwind format, revert spilling.
3212 if (produceCompactUnwindFrame(MF)) {
3213 SavedRegs.reset(UnspilledCSGPR);
3214 ExtraCSSpill = AArch64::NoRegister;
3215 }
3216 } else
3217 SavedRegs.set(UnspilledCSGPRPaired);
3218 }
3219 }
3220
3221 // If we didn't find an extra callee-saved register to spill, create
3222 // an emergency spill slot.
3223 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3225 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3226 unsigned Size = TRI->getSpillSize(RC);
3227 Align Alignment = TRI->getSpillAlign(RC);
3228 int FI = MFI.CreateStackObject(Size, Alignment, false);
3230 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3231 << " as the emergency spill slot.\n");
3232 }
3233 }
3234
3235 // Adding the size of additional 64bit GPR saves.
3236 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3237
3238 // A Swift asynchronous context extends the frame record with a pointer
3239 // directly before FP.
3240 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3241 CSStackSize += 8;
3242
3243 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3244 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3245 << EstimatedStackSize + AlignedCSStackSize
3246 << " bytes.\n");
3247
3249 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3250 "Should not invalidate callee saved info");
3251
3252 // Round up to register pair alignment to avoid additional SP adjustment
3253 // instructions.
3254 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3255 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3256 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3257}
3258
3260 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3261 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3262 unsigned &MaxCSFrameIndex) const {
3263 bool NeedsWinCFI = needsWinCFI(MF);
3264 // To match the canonical windows frame layout, reverse the list of
3265 // callee saved registers to get them laid out by PrologEpilogInserter
3266 // in the right order. (PrologEpilogInserter allocates stack objects top
3267 // down. Windows canonical prologs store higher numbered registers at
3268 // the top, thus have the CSI array start from the highest registers.)
3269 if (NeedsWinCFI)
3270 std::reverse(CSI.begin(), CSI.end());
3271
3272 if (CSI.empty())
3273 return true; // Early exit if no callee saved registers are modified!
3274
3275 // Now that we know which registers need to be saved and restored, allocate
3276 // stack slots for them.
3277 MachineFrameInfo &MFI = MF.getFrameInfo();
3278 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3279
3280 bool UsesWinAAPCS = isTargetWindows(MF);
3281 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3282 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3283 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3284 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3285 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3286 }
3287
3288 for (auto &CS : CSI) {
3289 Register Reg = CS.getReg();
3290 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3291
3292 unsigned Size = RegInfo->getSpillSize(*RC);
3293 Align Alignment(RegInfo->getSpillAlign(*RC));
3294 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3295 CS.setFrameIdx(FrameIdx);
3296
3297 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3298 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3299
3300 // Grab 8 bytes below FP for the extended asynchronous frame info.
3301 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3302 Reg == AArch64::FP) {
3303 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3304 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3305 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3306 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3307 }
3308 }
3309 return true;
3310}
3311
3313 const MachineFunction &MF) const {
3315 // If the function has streaming-mode changes, don't scavenge a
3316 // spillslot in the callee-save area, as that might require an
3317 // 'addvl' in the streaming-mode-changing call-sequence when the
3318 // function doesn't use a FP.
3319 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
3320 return false;
3321 return AFI->hasCalleeSaveStackFreeSpace();
3322}
3323
3324/// returns true if there are any SVE callee saves.
3326 int &Min, int &Max) {
3327 Min = std::numeric_limits<int>::max();
3328 Max = std::numeric_limits<int>::min();
3329
3330 if (!MFI.isCalleeSavedInfoValid())
3331 return false;
3332
3333 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
3334 for (auto &CS : CSI) {
3335 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
3336 AArch64::PPRRegClass.contains(CS.getReg())) {
3337 assert((Max == std::numeric_limits<int>::min() ||
3338 Max + 1 == CS.getFrameIdx()) &&
3339 "SVE CalleeSaves are not consecutive");
3340
3341 Min = std::min(Min, CS.getFrameIdx());
3342 Max = std::max(Max, CS.getFrameIdx());
3343 }
3344 }
3345 return Min != std::numeric_limits<int>::max();
3346}
3347
3348// Process all the SVE stack objects and determine offsets for each
3349// object. If AssignOffsets is true, the offsets get assigned.
3350// Fills in the first and last callee-saved frame indices into
3351// Min/MaxCSFrameIndex, respectively.
3352// Returns the size of the stack.
3354 int &MinCSFrameIndex,
3355 int &MaxCSFrameIndex,
3356 bool AssignOffsets) {
3357#ifndef NDEBUG
3358 // First process all fixed stack objects.
3359 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
3361 "SVE vectors should never be passed on the stack by value, only by "
3362 "reference.");
3363#endif
3364
3365 auto Assign = [&MFI](int FI, int64_t Offset) {
3366 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
3367 MFI.setObjectOffset(FI, Offset);
3368 };
3369
3370 int64_t Offset = 0;
3371
3372 // Then process all callee saved slots.
3373 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
3374 // Assign offsets to the callee save slots.
3375 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
3376 Offset += MFI.getObjectSize(I);
3378 if (AssignOffsets)
3379 Assign(I, -Offset);
3380 }
3381 }
3382
3383 // Ensure that the Callee-save area is aligned to 16bytes.
3384 Offset = alignTo(Offset, Align(16U));
3385
3386 // Create a buffer of SVE objects to allocate and sort it.
3387 SmallVector<int, 8> ObjectsToAllocate;
3388 // If we have a stack protector, and we've previously decided that we have SVE
3389 // objects on the stack and thus need it to go in the SVE stack area, then it
3390 // needs to go first.
3391 int StackProtectorFI = -1;
3392 if (MFI.hasStackProtectorIndex()) {
3393 StackProtectorFI = MFI.getStackProtectorIndex();
3394 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
3395 ObjectsToAllocate.push_back(StackProtectorFI);
3396 }
3397 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
3398 unsigned StackID = MFI.getStackID(I);
3399 if (StackID != TargetStackID::ScalableVector)
3400 continue;
3401 if (I == StackProtectorFI)
3402 continue;
3403 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
3404 continue;
3405 if (MFI.isDeadObjectIndex(I))
3406 continue;
3407
3408 ObjectsToAllocate.push_back(I);
3409 }
3410
3411 // Allocate all SVE locals and spills
3412 for (unsigned FI : ObjectsToAllocate) {
3413 Align Alignment = MFI.getObjectAlign(FI);
3414 // FIXME: Given that the length of SVE vectors is not necessarily a power of
3415 // two, we'd need to align every object dynamically at runtime if the
3416 // alignment is larger than 16. This is not yet supported.
3417 if (Alignment > Align(16))
3419 "Alignment of scalable vectors > 16 bytes is not yet supported");
3420
3421 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
3422 if (AssignOffsets)
3423 Assign(FI, -Offset);
3424 }
3425
3426 return Offset;
3427}
3428
3429int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
3430 MachineFrameInfo &MFI) const {
3431 int MinCSFrameIndex, MaxCSFrameIndex;
3432 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
3433}
3434
3435int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
3436 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
3437 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
3438 true);
3439}
3440
3442 MachineFunction &MF, RegScavenger *RS) const {
3443 MachineFrameInfo &MFI = MF.getFrameInfo();
3444
3446 "Upwards growing stack unsupported");
3447
3448 int MinCSFrameIndex, MaxCSFrameIndex;
3449 int64_t SVEStackSize =
3450 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
3451
3453 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
3454 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
3455
3456 // If this function isn't doing Win64-style C++ EH, we don't need to do
3457 // anything.
3458 if (!MF.hasEHFunclets())
3459 return;
3461 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3462
3463 MachineBasicBlock &MBB = MF.front();
3464 auto MBBI = MBB.begin();
3465 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3466 ++MBBI;
3467
3468 // Create an UnwindHelp object.
3469 // The UnwindHelp object is allocated at the start of the fixed object area
3470 int64_t FixedObject =
3471 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
3472 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
3473 /*SPOffset*/ -FixedObject,
3474 /*IsImmutable=*/false);
3475 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3476
3477 // We need to store -2 into the UnwindHelp object at the start of the
3478 // function.
3479 DebugLoc DL;
3481 RS->backward(MBBI);
3482 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3483 assert(DstReg && "There must be a free register after frame setup");
3484 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3485 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3486 .addReg(DstReg, getKillRegState(true))
3487 .addFrameIndex(UnwindHelpFI)
3488 .addImm(0);
3489}
3490
3491namespace {
3492struct TagStoreInstr {
3494 int64_t Offset, Size;
3495 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3496 : MI(MI), Offset(Offset), Size(Size) {}
3497};
3498
3499class TagStoreEdit {
3500 MachineFunction *MF;
3503 // Tag store instructions that are being replaced.
3505 // Combined memref arguments of the above instructions.
3507
3508 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3509 // FrameRegOffset + Size) with the address tag of SP.
3510 Register FrameReg;
3511 StackOffset FrameRegOffset;
3512 int64_t Size;
3513 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3514 // end.
3515 std::optional<int64_t> FrameRegUpdate;
3516 // MIFlags for any FrameReg updating instructions.
3517 unsigned FrameRegUpdateFlags;
3518
3519 // Use zeroing instruction variants.
3520 bool ZeroData;
3521 DebugLoc DL;
3522
3523 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3524 void emitLoop(MachineBasicBlock::iterator InsertI);
3525
3526public:
3527 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3528 : MBB(MBB), ZeroData(ZeroData) {
3529 MF = MBB->getParent();
3530 MRI = &MF->getRegInfo();
3531 }
3532 // Add an instruction to be replaced. Instructions must be added in the
3533 // ascending order of Offset, and have to be adjacent.
3534 void addInstruction(TagStoreInstr I) {
3535 assert((TagStores.empty() ||
3536 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3537 "Non-adjacent tag store instructions.");
3538 TagStores.push_back(I);
3539 }
3540 void clear() { TagStores.clear(); }
3541 // Emit equivalent code at the given location, and erase the current set of
3542 // instructions. May skip if the replacement is not profitable. May invalidate
3543 // the input iterator and replace it with a valid one.
3544 void emitCode(MachineBasicBlock::iterator &InsertI,
3545 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3546};
3547
3548void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3549 const AArch64InstrInfo *TII =
3550 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3551
3552 const int64_t kMinOffset = -256 * 16;
3553 const int64_t kMaxOffset = 255 * 16;
3554
3555 Register BaseReg = FrameReg;
3556 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3557 if (BaseRegOffsetBytes < kMinOffset ||
3558 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3559 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3560 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3561 // is required for the offset of ST2G.
3562 BaseRegOffsetBytes % 16 != 0) {
3563 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3564 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3565 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3566 BaseReg = ScratchReg;
3567 BaseRegOffsetBytes = 0;
3568 }
3569
3570 MachineInstr *LastI = nullptr;
3571 while (Size) {
3572 int64_t InstrSize = (Size > 16) ? 32 : 16;
3573 unsigned Opcode =
3574 InstrSize == 16
3575 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3576 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3577 assert(BaseRegOffsetBytes % 16 == 0);
3578 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3579 .addReg(AArch64::SP)
3580 .addReg(BaseReg)
3581 .addImm(BaseRegOffsetBytes / 16)
3582 .setMemRefs(CombinedMemRefs);
3583 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3584 // final SP adjustment in the epilogue.
3585 if (BaseRegOffsetBytes == 0)
3586 LastI = I;
3587 BaseRegOffsetBytes += InstrSize;
3588 Size -= InstrSize;
3589 }
3590
3591 if (LastI)
3592 MBB->splice(InsertI, MBB, LastI);
3593}
3594
3595void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3596 const AArch64InstrInfo *TII =
3597 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3598
3599 Register BaseReg = FrameRegUpdate
3600 ? FrameReg
3601 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3602 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3603
3604 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3605
3606 int64_t LoopSize = Size;
3607 // If the loop size is not a multiple of 32, split off one 16-byte store at
3608 // the end to fold BaseReg update into.
3609 if (FrameRegUpdate && *FrameRegUpdate)
3610 LoopSize -= LoopSize % 32;
3611 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3612 TII->get(ZeroData ? AArch64::STZGloop_wback
3613 : AArch64::STGloop_wback))
3614 .addDef(SizeReg)
3615 .addDef(BaseReg)
3616 .addImm(LoopSize)
3617 .addReg(BaseReg)
3618 .setMemRefs(CombinedMemRefs);
3619 if (FrameRegUpdate)
3620 LoopI->setFlags(FrameRegUpdateFlags);
3621
3622 int64_t ExtraBaseRegUpdate =
3623 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3624 if (LoopSize < Size) {
3625 assert(FrameRegUpdate);
3626 assert(Size - LoopSize == 16);
3627 // Tag 16 more bytes at BaseReg and update BaseReg.
3628 BuildMI(*MBB, InsertI, DL,
3629 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3630 .addDef(BaseReg)
3631 .addReg(BaseReg)
3632 .addReg(BaseReg)
3633 .addImm(1 + ExtraBaseRegUpdate / 16)
3634 .setMemRefs(CombinedMemRefs)
3635 .setMIFlags(FrameRegUpdateFlags);
3636 } else if (ExtraBaseRegUpdate) {
3637 // Update BaseReg.
3638 BuildMI(
3639 *MBB, InsertI, DL,
3640 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3641 .addDef(BaseReg)
3642 .addReg(BaseReg)
3643 .addImm(std::abs(ExtraBaseRegUpdate))
3644 .addImm(0)
3645 .setMIFlags(FrameRegUpdateFlags);
3646 }
3647}
3648
3649// Check if *II is a register update that can be merged into STGloop that ends
3650// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3651// end of the loop.
3652bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3653 int64_t Size, int64_t *TotalOffset) {
3654 MachineInstr &MI = *II;
3655 if ((MI.getOpcode() == AArch64::ADDXri ||
3656 MI.getOpcode() == AArch64::SUBXri) &&
3657 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3658 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3659 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3660 if (MI.getOpcode() == AArch64::SUBXri)
3661 Offset = -Offset;
3662 int64_t AbsPostOffset = std::abs(Offset - Size);
3663 const int64_t kMaxOffset =
3664 0xFFF; // Max encoding for unshifted ADDXri / SUBXri
3665 if (AbsPostOffset <= kMaxOffset && AbsPostOffset % 16 == 0) {
3666 *TotalOffset = Offset;
3667 return true;
3668 }
3669 }
3670 return false;
3671}
3672
3673void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3675 MemRefs.clear();
3676 for (auto &TS : TSE) {
3677 MachineInstr *MI = TS.MI;
3678 // An instruction without memory operands may access anything. Be
3679 // conservative and return an empty list.
3680 if (MI->memoperands_empty()) {
3681 MemRefs.clear();
3682 return;
3683 }
3684 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3685 }
3686}
3687
3688void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3689 const AArch64FrameLowering *TFI,
3690 bool TryMergeSPUpdate) {
3691 if (TagStores.empty())
3692 return;
3693 TagStoreInstr &FirstTagStore = TagStores[0];
3694 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3695 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3696 DL = TagStores[0].MI->getDebugLoc();
3697
3698 Register Reg;
3699 FrameRegOffset = TFI->resolveFrameOffsetReference(
3700 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
3701 /*PreferFP=*/false, /*ForSimm=*/true);
3702 FrameReg = Reg;
3703 FrameRegUpdate = std::nullopt;
3704
3705 mergeMemRefs(TagStores, CombinedMemRefs);
3706
3707 LLVM_DEBUG(dbgs() << "Replacing adjacent STG instructions:\n";
3708 for (const auto &Instr
3709 : TagStores) { dbgs() << " " << *Instr.MI; });
3710
3711 // Size threshold where a loop becomes shorter than a linear sequence of
3712 // tagging instructions.
3713 const int kSetTagLoopThreshold = 176;
3714 if (Size < kSetTagLoopThreshold) {
3715 if (TagStores.size() < 2)
3716 return;
3717 emitUnrolled(InsertI);
3718 } else {
3719 MachineInstr *UpdateInstr = nullptr;
3720 int64_t TotalOffset = 0;
3721 if (TryMergeSPUpdate) {
3722 // See if we can merge base register update into the STGloop.
3723 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3724 // but STGloop is way too unusual for that, and also it only
3725 // realistically happens in function epilogue. Also, STGloop is expanded
3726 // before that pass.
3727 if (InsertI != MBB->end() &&
3728 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3729 &TotalOffset)) {
3730 UpdateInstr = &*InsertI++;
3731 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3732 << *UpdateInstr);
3733 }
3734 }
3735
3736 if (!UpdateInstr && TagStores.size() < 2)
3737 return;
3738
3739 if (UpdateInstr) {
3740 FrameRegUpdate = TotalOffset;
3741 FrameRegUpdateFlags = UpdateInstr->getFlags();
3742 }
3743 emitLoop(InsertI);
3744 if (UpdateInstr)
3745 UpdateInstr->eraseFromParent();
3746 }
3747
3748 for (auto &TS : TagStores)
3749 TS.MI->eraseFromParent();
3750}
3751
3752bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3753 int64_t &Size, bool &ZeroData) {
3754 MachineFunction &MF = *MI.getParent()->getParent();
3755 const MachineFrameInfo &MFI = MF.getFrameInfo();
3756
3757 unsigned Opcode = MI.getOpcode();
3758 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3759 Opcode == AArch64::STZ2Gi);
3760
3761 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3762 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3763 return false;
3764 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3765 return false;
3766 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3767 Size = MI.getOperand(2).getImm();
3768 return true;
3769 }
3770
3771 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3772 Size = 16;
3773 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3774 Size = 32;
3775 else
3776 return false;
3777
3778 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3779 return false;
3780
3781 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3782 16 * MI.getOperand(2).getImm();
3783 return true;
3784}
3785
3786// Detect a run of memory tagging instructions for adjacent stack frame slots,
3787// and replace them with a shorter instruction sequence:
3788// * replace STG + STG with ST2G
3789// * replace STGloop + STGloop with STGloop
3790// This code needs to run when stack slot offsets are already known, but before
3791// FrameIndex operands in STG instructions are eliminated.
3793 const AArch64FrameLowering *TFI,
3794 RegScavenger *RS) {
3795 bool FirstZeroData;
3796 int64_t Size, Offset;
3797 MachineInstr &MI = *II;
3798 MachineBasicBlock *MBB = MI.getParent();
3799 MachineBasicBlock::iterator NextI = ++II;
3800 if (&MI == &MBB->instr_back())
3801 return II;
3802 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3803 return II;
3804
3806 Instrs.emplace_back(&MI, Offset, Size);
3807
3808 constexpr int kScanLimit = 10;
3809 int Count = 0;
3811 NextI != E && Count < kScanLimit; ++NextI) {
3812 MachineInstr &MI = *NextI;
3813 bool ZeroData;
3814 int64_t Size, Offset;
3815 // Collect instructions that update memory tags with a FrameIndex operand
3816 // and (when applicable) constant size, and whose output registers are dead
3817 // (the latter is almost always the case in practice). Since these
3818 // instructions effectively have no inputs or outputs, we are free to skip
3819 // any non-aliasing instructions in between without tracking used registers.
3820 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3821 if (ZeroData != FirstZeroData)
3822 break;
3823 Instrs.emplace_back(&MI, Offset, Size);
3824 continue;
3825 }
3826
3827 // Only count non-transient, non-tagging instructions toward the scan
3828 // limit.
3829 if (!MI.isTransient())
3830 ++Count;
3831
3832 // Just in case, stop before the epilogue code starts.
3833 if (MI.getFlag(MachineInstr::FrameSetup) ||
3835 break;
3836
3837 // Reject anything that may alias the collected instructions.
3838 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
3839 break;
3840 }
3841
3842 // New code will be inserted after the last tagging instruction we've found.
3843 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3844
3845 // All the gathered stack tag instructions are merged and placed after
3846 // last tag store in the list. The check should be made if the nzcv
3847 // flag is live at the point where we are trying to insert. Otherwise
3848 // the nzcv flag might get clobbered if any stg loops are present.
3849
3850 // FIXME : This approach of bailing out from merge is conservative in
3851 // some ways like even if stg loops are not present after merge the
3852 // insert list, this liveness check is done (which is not needed).
3854 LiveRegs.addLiveOuts(*MBB);
3855 for (auto I = MBB->rbegin();; ++I) {
3856 MachineInstr &MI = *I;
3857 if (MI == InsertI)
3858 break;
3859 LiveRegs.stepBackward(*I);
3860 }
3861 InsertI++;
3862 if (LiveRegs.contains(AArch64::NZCV))
3863 return InsertI;
3864
3865 llvm::stable_sort(Instrs,
3866 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3867 return Left.Offset < Right.Offset;
3868 });
3869
3870 // Make sure that we don't have any overlapping stores.
3871 int64_t CurOffset = Instrs[0].Offset;
3872 for (auto &Instr : Instrs) {
3873 if (CurOffset > Instr.Offset)
3874 return NextI;
3875 CurOffset = Instr.Offset + Instr.Size;
3876 }
3877
3878 // Find contiguous runs of tagged memory and emit shorter instruction
3879 // sequencies for them when possible.
3880 TagStoreEdit TSE(MBB, FirstZeroData);
3881 std::optional<int64_t> EndOffset;
3882 for (auto &Instr : Instrs) {
3883 if (EndOffset && *EndOffset != Instr.Offset) {
3884 // Found a gap.
3885 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3886 TSE.clear();
3887 }
3888
3889 TSE.addInstruction(Instr);
3890 EndOffset = Instr.Offset + Instr.Size;
3891 }
3892
3893 const MachineFunction *MF = MBB->getParent();
3894 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3895 TSE.emitCode(
3896 InsertI, TFI, /*TryMergeSPUpdate = */
3898
3899 return InsertI;
3900}
3901} // namespace
3902
3904 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3906 for (auto &BB : MF)
3907 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();)
3908 II = tryMergeAdjacentSTG(II, this, RS);
3909}
3910
3911/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3912/// before the update. This is easily retrieved as it is exactly the offset
3913/// that is set in processFunctionBeforeFrameFinalized.
3915 const MachineFunction &MF, int FI, Register &FrameReg,
3916 bool IgnoreSPUpdates) const {
3917 const MachineFrameInfo &MFI = MF.getFrameInfo();
3918 if (IgnoreSPUpdates) {
3919 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3920 << MFI.getObjectOffset(FI) << "\n");
3921 FrameReg = AArch64::SP;
3922 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3923 }
3924
3925 // Go to common code if we cannot provide sp + offset.
3926 if (MFI.hasVarSizedObjects() ||
3929 return getFrameIndexReference(MF, FI, FrameReg);
3930
3931 FrameReg = AArch64::SP;
3932 return getStackOffset(MF, MFI.getObjectOffset(FI));
3933}
3934
3935/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3936/// the parent's frame pointer
3938 const MachineFunction &MF) const {
3939 return 0;
3940}
3941
3942/// Funclets only need to account for space for the callee saved registers,
3943/// as the locals are accounted for in the parent's stack frame.
3945 const MachineFunction &MF) const {
3946 // This is the size of the pushed CSRs.
3947 unsigned CSSize =
3948 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3949 // This is the amount of stack a funclet needs to allocate.
3950 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3951 getStackAlign());
3952}
3953
3954namespace {
3955struct FrameObject {
3956 bool IsValid = false;
3957 // Index of the object in MFI.
3958 int ObjectIndex = 0;
3959 // Group ID this object belongs to.
3960 int GroupIndex = -1;
3961 // This object should be placed first (closest to SP).
3962 bool ObjectFirst = false;
3963 // This object's group (which always contains the object with
3964 // ObjectFirst==true) should be placed first.
3965 bool GroupFirst = false;
3966};
3967
3968class GroupBuilder {
3969 SmallVector<int, 8> CurrentMembers;
3970 int NextGroupIndex = 0;
3971 std::vector<FrameObject> &Objects;
3972
3973public:
3974 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3975 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3976 void EndCurrentGroup() {
3977 if (CurrentMembers.size() > 1) {
3978 // Create a new group with the current member list. This might remove them
3979 // from their pre-existing groups. That's OK, dealing with overlapping
3980 // groups is too hard and unlikely to make a difference.
3981 LLVM_DEBUG(dbgs() << "group:");
3982 for (int Index : CurrentMembers) {
3983 Objects[Index].GroupIndex = NextGroupIndex;
3984 LLVM_DEBUG(dbgs() << " " << Index);
3985 }
3986 LLVM_DEBUG(dbgs() << "\n");
3987 NextGroupIndex++;
3988 }
3989 CurrentMembers.clear();
3990 }
3991};
3992
3993bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3994 // Objects at a lower index are closer to FP; objects at a higher index are
3995 // closer to SP.
3996 //
3997 // For consistency in our comparison, all invalid objects are placed
3998 // at the end. This also allows us to stop walking when we hit the
3999 // first invalid item after it's all sorted.
4000 //
4001 // The "first" object goes first (closest to SP), followed by the members of
4002 // the "first" group.
4003 //
4004 // The rest are sorted by the group index to keep the groups together.
4005 // Higher numbered groups are more likely to be around longer (i.e. untagged
4006 // in the function epilogue and not at some earlier point). Place them closer
4007 // to SP.
4008 //
4009 // If all else equal, sort by the object index to keep the objects in the
4010 // original order.
4011 return std::make_tuple(!A.IsValid, A.ObjectFirst, A.GroupFirst, A.GroupIndex,
4012 A.ObjectIndex) <
4013 std::make_tuple(!B.IsValid, B.ObjectFirst, B.GroupFirst, B.GroupIndex,
4014 B.ObjectIndex);
4015}
4016} // namespace
4017
4019 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
4020 if (!OrderFrameObjects || ObjectsToAllocate.empty())
4021 return;
4022
4023 const MachineFrameInfo &MFI = MF.getFrameInfo();
4024 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
4025 for (auto &Obj : ObjectsToAllocate) {
4026 FrameObjects[Obj].IsValid = true;
4027 FrameObjects[Obj].ObjectIndex = Obj;
4028 }
4029
4030 // Identify stack slots that are tagged at the same time.
4031 GroupBuilder GB(FrameObjects);
4032 for (auto &MBB : MF) {
4033 for (auto &MI : MBB) {
4034 if (MI.isDebugInstr())
4035 continue;
4036 int OpIndex;
4037 switch (MI.getOpcode()) {
4038 case AArch64::STGloop:
4039 case AArch64::STZGloop:
4040 OpIndex = 3;
4041 break;
4042 case AArch64::STGi:
4043 case AArch64::STZGi:
4044 case AArch64::ST2Gi:
4045 case AArch64::STZ2Gi:
4046 OpIndex = 1;
4047 break;
4048 default:
4049 OpIndex = -1;
4050 }
4051
4052 int TaggedFI = -1;
4053 if (OpIndex >= 0) {
4054 const MachineOperand &MO = MI.getOperand(OpIndex);
4055 if (MO.isFI()) {
4056 int FI = MO.getIndex();
4057 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
4058 FrameObjects[FI].IsValid)
4059 TaggedFI = FI;
4060 }
4061 }
4062
4063 // If this is a stack tagging instruction for a slot that is not part of a
4064 // group yet, either start a new group or add it to the current one.
4065 if (TaggedFI >= 0)
4066 GB.AddMember(TaggedFI);
4067 else
4068 GB.EndCurrentGroup();
4069 }
4070 // Groups should never span multiple basic blocks.
4071 GB.EndCurrentGroup();
4072 }
4073
4074 // If the function's tagged base pointer is pinned to a stack slot, we want to
4075 // put that slot first when possible. This will likely place it at SP + 0,
4076 // and save one instruction when generating the base pointer because IRG does
4077 // not allow an immediate offset.
4079 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
4080 if (TBPI) {
4081 FrameObjects[*TBPI].ObjectFirst = true;
4082 FrameObjects[*TBPI].GroupFirst = true;
4083 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
4084 if (FirstGroupIndex >= 0)
4085 for (FrameObject &Object : FrameObjects)
4086 if (Object.GroupIndex == FirstGroupIndex)
4087 Object.GroupFirst = true;
4088 }
4089
4090 llvm::stable_sort(FrameObjects, FrameObjectCompare);
4091
4092 int i = 0;
4093 for (auto &Obj : FrameObjects) {
4094 // All invalid items are sorted at the end, so it's safe to stop.
4095 if (!Obj.IsValid)
4096 break;
4097 ObjectsToAllocate[i++] = Obj.ObjectIndex;
4098 }
4099
4100 LLVM_DEBUG(dbgs() << "Final frame order:\n"; for (auto &Obj
4101 : FrameObjects) {
4102 if (!Obj.IsValid)
4103 break;
4104 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
4105 if (Obj.ObjectFirst)
4106 dbgs() << ", first";
4107 if (Obj.GroupFirst)
4108 dbgs() << ", group-first";
4109 dbgs() << "\n";
4110 });
4111}
unsigned const MachineRegisterInfo * MRI
#define Success
for(const MachineOperand &MO :llvm::drop_begin(OldMI.operands(), Desc.getNumOperands()))
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned FixedObject)
static bool needsWinCFI(const MachineFunction &MF)
static cl::opt< bool > ReverseCSRRestoreSeq("reverse-csr-restore-seq", cl::desc("reverse the CSR restore sequence"), cl::init(false), cl::Hidden)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static unsigned findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
static void getLivePhysRegsUpTo(MachineInstr &MI, const TargetRegisterInfo &TRI, LivePhysRegs &LiveRegs)
Collect live registers from the end of MI's parent up to (including) MI in LiveRegs.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
unsigned RegSize
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
static const int kSetTagLoopThreshold
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
static void clear(coro::Shape &Shape)
Definition: Coroutines.cpp:145
#define LLVM_DEBUG(X)
Definition: Debug.h:101
uint64_t Size
bool End
Definition: ELF_riscv.cpp:478
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
if(VerifyEach)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:167
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:469
static const unsigned FramePtr
static constexpr uint32_t Opcode
Definition: aarch32.h:200
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFP(const MachineFunction &MF) const override
hasFP - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
bool enableCFIFixup(MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
const Triple & getTargetTriple() const
bool isCallingConvWin64(CallingConv::ID CC) const
const char * getChkStkName() const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
bool test(unsigned Idx) const
Definition: BitVector.h:461
BitVector & reset()
Definition: BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:677
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:674
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:262
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:338
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:666
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc) const override
Emit instructions to copy a pair of physical registers.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:50
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void stepBackward(const MachineInstr &MI)
Simulates liveness when stepping backwards over an instruction(bundle).
void removeReg(MCPhysReg Reg)
Removes a physical register, all its sub-registers, and all its super-registers from the set.
Definition: LivePhysRegs.h:90
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds all live-out registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:81
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:805
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int Offset, SMLoc Loc={})
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:582
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int Offset, SMLoc Loc={})
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:555
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:615
static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:608
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int Offset, SMLoc Loc={})
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:540
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, SMLoc Loc={}, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:646
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:629
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:321
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:40
instr_iterator instr_begin()
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
instr_iterator instr_end()
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
uint8_t getStackID(int ObjectIdx) const
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, uint64_t s, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
const LLVMTargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
<