LLVM 20.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | <hazard padding> |
56// |-----------------------------------|
57// | |
58// | callee-saved fp/simd/SVE regs |
59// | |
60// |-----------------------------------|
61// | |
62// | SVE stack objects |
63// | |
64// |-----------------------------------|
65// |.empty.space.to.make.part.below....|
66// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
67// |.the.standard.16-byte.alignment....| compile time; if present)
68// |-----------------------------------|
69// | local variables of fixed size |
70// | including spill slots |
71// | <FPR> |
72// | <hazard padding> |
73// | <GPR> |
74// |-----------------------------------| <- bp(not defined by ABI,
75// |.variable-sized.local.variables....| LLVM chooses X19)
76// |.(VLAs)............................| (size of this area is unknown at
77// |...................................| compile time)
78// |-----------------------------------| <- sp
79// | | Lower address
80//
81//
82// To access the data in a frame, at-compile time, a constant offset must be
83// computable from one of the pointers (fp, bp, sp) to access it. The size
84// of the areas with a dotted background cannot be computed at compile-time
85// if they are present, making it required to have all three of fp, bp and
86// sp to be set up to be able to access all contents in the frame areas,
87// assuming all of the frame areas are non-empty.
88//
89// For most functions, some of the frame areas are empty. For those functions,
90// it may not be necessary to set up fp or bp:
91// * A base pointer is definitely needed when there are both VLAs and local
92// variables with more-than-default alignment requirements.
93// * A frame pointer is definitely needed when there are local variables with
94// more-than-default alignment requirements.
95//
96// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
97// callee-saved area, since the unwind encoding does not allow for encoding
98// this dynamically and existing tools depend on this layout. For other
99// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
100// area to allow SVE stack objects (allocated directly below the callee-saves,
101// if available) to be accessed directly from the framepointer.
102// The SVE spill/fill instructions have VL-scaled addressing modes such
103// as:
104// ldr z8, [fp, #-7 mul vl]
105// For SVE the size of the vector length (VL) is not known at compile-time, so
106// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
107// layout, we don't need to add an unscaled offset to the framepointer before
108// accessing the SVE object in the frame.
109//
110// In some cases when a base pointer is not strictly needed, it is generated
111// anyway when offsets from the frame pointer to access local variables become
112// so large that the offset can't be encoded in the immediate fields of loads
113// or stores.
114//
115// Outgoing function arguments must be at the bottom of the stack frame when
116// calling another function. If we do not have variable-sized stack objects, we
117// can allocate a "reserved call frame" area at the bottom of the local
118// variable area, large enough for all outgoing calls. If we do have VLAs, then
119// the stack pointer must be decremented and incremented around each call to
120// make space for the arguments below the VLAs.
121//
122// FIXME: also explain the redzone concept.
123//
124// About stack hazards: Under some SME contexts, a coprocessor with its own
125// separate cache can used for FP operations. This can create hazards if the CPU
126// and the SME unit try to access the same area of memory, including if the
127// access is to an area of the stack. To try to alleviate this we attempt to
128// introduce extra padding into the stack frame between FP and GPR accesses,
129// controlled by the StackHazardSize option. Without changing the layout of the
130// stack frame in the diagram above, a stack object of size StackHazardSize is
131// added between GPR and FPR CSRs. Another is added to the stack objects
132// section, and stack objects are sorted so that FPR > Hazard padding slot >
133// GPRs (where possible). Unfortunately some things are not handled well (VLA
134// area, arguments on the stack, object with both GPR and FPR accesses), but if
135// those are controlled by the user then the entire stack frame becomes GPR at
136// the start/end with FPR in the middle, surrounded by Hazard padding.
137//
138// An example of the prologue:
139//
140// .globl __foo
141// .align 2
142// __foo:
143// Ltmp0:
144// .cfi_startproc
145// .cfi_personality 155, ___gxx_personality_v0
146// Leh_func_begin:
147// .cfi_lsda 16, Lexception33
148//
149// stp xa,bx, [sp, -#offset]!
150// ...
151// stp x28, x27, [sp, #offset-32]
152// stp fp, lr, [sp, #offset-16]
153// add fp, sp, #offset - 16
154// sub sp, sp, #1360
155//
156// The Stack:
157// +-------------------------------------------+
158// 10000 | ........ | ........ | ........ | ........ |
159// 10004 | ........ | ........ | ........ | ........ |
160// +-------------------------------------------+
161// 10008 | ........ | ........ | ........ | ........ |
162// 1000c | ........ | ........ | ........ | ........ |
163// +===========================================+
164// 10010 | X28 Register |
165// 10014 | X28 Register |
166// +-------------------------------------------+
167// 10018 | X27 Register |
168// 1001c | X27 Register |
169// +===========================================+
170// 10020 | Frame Pointer |
171// 10024 | Frame Pointer |
172// +-------------------------------------------+
173// 10028 | Link Register |
174// 1002c | Link Register |
175// +===========================================+
176// 10030 | ........ | ........ | ........ | ........ |
177// 10034 | ........ | ........ | ........ | ........ |
178// +-------------------------------------------+
179// 10038 | ........ | ........ | ........ | ........ |
180// 1003c | ........ | ........ | ........ | ........ |
181// +-------------------------------------------+
182//
183// [sp] = 10030 :: >>initial value<<
184// sp = 10020 :: stp fp, lr, [sp, #-16]!
185// fp = sp == 10020 :: mov fp, sp
186// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
187// sp == 10010 :: >>final value<<
188//
189// The frame pointer (w29) points to address 10020. If we use an offset of
190// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
191// for w27, and -32 for w28:
192//
193// Ltmp1:
194// .cfi_def_cfa w29, 16
195// Ltmp2:
196// .cfi_offset w30, -8
197// Ltmp3:
198// .cfi_offset w29, -16
199// Ltmp4:
200// .cfi_offset w27, -24
201// Ltmp5:
202// .cfi_offset w28, -32
203//
204//===----------------------------------------------------------------------===//
205
206#include "AArch64FrameLowering.h"
207#include "AArch64InstrInfo.h"
209#include "AArch64RegisterInfo.h"
210#include "AArch64Subtarget.h"
211#include "AArch64TargetMachine.h"
214#include "llvm/ADT/ScopeExit.h"
215#include "llvm/ADT/SmallVector.h"
216#include "llvm/ADT/Statistic.h"
233#include "llvm/IR/Attributes.h"
234#include "llvm/IR/CallingConv.h"
235#include "llvm/IR/DataLayout.h"
236#include "llvm/IR/DebugLoc.h"
237#include "llvm/IR/Function.h"
238#include "llvm/MC/MCAsmInfo.h"
239#include "llvm/MC/MCDwarf.h"
241#include "llvm/Support/Debug.h"
247#include <cassert>
248#include <cstdint>
249#include <iterator>
250#include <optional>
251#include <vector>
252
253using namespace llvm;
254
255#define DEBUG_TYPE "frame-info"
256
257static cl::opt<bool> EnableRedZone("aarch64-redzone",
258 cl::desc("enable use of redzone on AArch64"),
259 cl::init(false), cl::Hidden);
260
262 "stack-tagging-merge-settag",
263 cl::desc("merge settag instruction in function epilog"), cl::init(true),
264 cl::Hidden);
265
266static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
267 cl::desc("sort stack allocations"),
268 cl::init(true), cl::Hidden);
269
271 "homogeneous-prolog-epilog", cl::Hidden,
272 cl::desc("Emit homogeneous prologue and epilogue for the size "
273 "optimization (default = off)"));
274
275// Stack hazard padding size. 0 = disabled.
276static cl::opt<unsigned> StackHazardSize("aarch64-stack-hazard-size",
277 cl::init(0), cl::Hidden);
278// Whether to insert padding into non-streaming functions (for testing).
279static cl::opt<bool>
280 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
281 cl::init(false), cl::Hidden);
282
283STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
284
285/// Returns how much of the incoming argument stack area (in bytes) we should
286/// clean up in an epilogue. For the C calling convention this will be 0, for
287/// guaranteed tail call conventions it can be positive (a normal return or a
288/// tail call to a function that uses less stack space for arguments) or
289/// negative (for a tail call to a function that needs more stack space than us
290/// for arguments).
295 bool IsTailCallReturn = (MBB.end() != MBBI)
297 : false;
298
299 int64_t ArgumentPopSize = 0;
300 if (IsTailCallReturn) {
301 MachineOperand &StackAdjust = MBBI->getOperand(1);
302
303 // For a tail-call in a callee-pops-arguments environment, some or all of
304 // the stack may actually be in use for the call's arguments, this is
305 // calculated during LowerCall and consumed here...
306 ArgumentPopSize = StackAdjust.getImm();
307 } else {
308 // ... otherwise the amount to pop is *all* of the argument space,
309 // conveniently stored in the MachineFunctionInfo by
310 // LowerFormalArguments. This will, of course, be zero for the C calling
311 // convention.
312 ArgumentPopSize = AFI->getArgumentStackToRestore();
313 }
314
315 return ArgumentPopSize;
316}
317
319static bool needsWinCFI(const MachineFunction &MF);
322
323/// Returns true if a homogeneous prolog or epilog code can be emitted
324/// for the size optimization. If possible, a frame helper call is injected.
325/// When Exit block is given, this check is for epilog.
326bool AArch64FrameLowering::homogeneousPrologEpilog(
327 MachineFunction &MF, MachineBasicBlock *Exit) const {
328 if (!MF.getFunction().hasMinSize())
329 return false;
331 return false;
332 if (EnableRedZone)
333 return false;
334
335 // TODO: Window is supported yet.
336 if (needsWinCFI(MF))
337 return false;
338 // TODO: SVE is not supported yet.
339 if (getSVEStackSize(MF))
340 return false;
341
342 // Bail on stack adjustment needed on return for simplicity.
343 const MachineFrameInfo &MFI = MF.getFrameInfo();
345 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
346 return false;
347 if (Exit && getArgumentStackToRestore(MF, *Exit))
348 return false;
349
350 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
351 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
352 return false;
353
354 // If there are an odd number of GPRs before LR and FP in the CSRs list,
355 // they will not be paired into one RegPairInfo, which is incompatible with
356 // the assumption made by the homogeneous prolog epilog pass.
357 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
358 unsigned NumGPRs = 0;
359 for (unsigned I = 0; CSRegs[I]; ++I) {
360 Register Reg = CSRegs[I];
361 if (Reg == AArch64::LR) {
362 assert(CSRegs[I + 1] == AArch64::FP);
363 if (NumGPRs % 2 != 0)
364 return false;
365 break;
366 }
367 if (AArch64::GPR64RegClass.contains(Reg))
368 ++NumGPRs;
369 }
370
371 return true;
372}
373
374/// Returns true if CSRs should be paired.
375bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
376 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
377}
378
379/// This is the biggest offset to the stack pointer we can encode in aarch64
380/// instructions (without using a separate calculation and a temp register).
381/// Note that the exception here are vector stores/loads which cannot encode any
382/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
383static const unsigned DefaultSafeSPDisplacement = 255;
384
385/// Look at each instruction that references stack frames and return the stack
386/// size limit beyond which some of these instructions will require a scratch
387/// register during their expansion later.
389 // FIXME: For now, just conservatively guestimate based on unscaled indexing
390 // range. We'll end up allocating an unnecessary spill slot a lot, but
391 // realistically that's not a big deal at this stage of the game.
392 for (MachineBasicBlock &MBB : MF) {
393 for (MachineInstr &MI : MBB) {
394 if (MI.isDebugInstr() || MI.isPseudo() ||
395 MI.getOpcode() == AArch64::ADDXri ||
396 MI.getOpcode() == AArch64::ADDSXri)
397 continue;
398
399 for (const MachineOperand &MO : MI.operands()) {
400 if (!MO.isFI())
401 continue;
402
404 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
406 return 0;
407 }
408 }
409 }
411}
412
416}
417
418/// Returns the size of the fixed object area (allocated next to sp on entry)
419/// On Win64 this may include a var args area and an UnwindHelp object for EH.
420static unsigned getFixedObjectSize(const MachineFunction &MF,
421 const AArch64FunctionInfo *AFI, bool IsWin64,
422 bool IsFunclet) {
423 if (!IsWin64 || IsFunclet) {
424 return AFI->getTailCallReservedStack();
425 } else {
426 if (AFI->getTailCallReservedStack() != 0 &&
428 Attribute::SwiftAsync))
429 report_fatal_error("cannot generate ABI-changing tail call for Win64");
430 // Var args are stored here in the primary function.
431 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
432 // To support EH funclets we allocate an UnwindHelp object
433 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
434 return AFI->getTailCallReservedStack() +
435 alignTo(VarArgsArea + UnwindHelpObject, 16);
436 }
437}
438
439/// Returns the size of the entire SVE stackframe (calleesaves + spills).
442 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
443}
444
446 if (!EnableRedZone)
447 return false;
448
449 // Don't use the red zone if the function explicitly asks us not to.
450 // This is typically used for kernel code.
451 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
452 const unsigned RedZoneSize =
454 if (!RedZoneSize)
455 return false;
456
457 const MachineFrameInfo &MFI = MF.getFrameInfo();
459 uint64_t NumBytes = AFI->getLocalStackSize();
460
461 // If neither NEON or SVE are available, a COPY from one Q-reg to
462 // another requires a spill -> reload sequence. We can do that
463 // using a pre-decrementing store/post-decrementing load, but
464 // if we do so, we can't use the Red Zone.
465 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
466 !Subtarget.isNeonAvailable() &&
467 !Subtarget.hasSVE();
468
469 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
470 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
471}
472
473/// hasFP - Return true if the specified function should have a dedicated frame
474/// pointer register.
476 const MachineFrameInfo &MFI = MF.getFrameInfo();
477 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
478
479 // Win64 EH requires a frame pointer if funclets are present, as the locals
480 // are accessed off the frame pointer in both the parent function and the
481 // funclets.
482 if (MF.hasEHFunclets())
483 return true;
484 // Retain behavior of always omitting the FP for leaf functions when possible.
486 return true;
487 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
488 MFI.hasStackMap() || MFI.hasPatchPoint() ||
489 RegInfo->hasStackRealignment(MF))
490 return true;
491 // With large callframes around we may need to use FP to access the scavenging
492 // emergency spillslot.
493 //
494 // Unfortunately some calls to hasFP() like machine verifier ->
495 // getReservedReg() -> hasFP in the middle of global isel are too early
496 // to know the max call frame size. Hopefully conservatively returning "true"
497 // in those cases is fine.
498 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
499 if (!MFI.isMaxCallFrameSizeComputed() ||
501 return true;
502
503 return false;
504}
505
506/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
507/// not required, we reserve argument space for call sites in the function
508/// immediately on entry to the current function. This eliminates the need for
509/// add/sub sp brackets around call sites. Returns true if the call frame is
510/// included as part of the stack frame.
512 const MachineFunction &MF) const {
513 // The stack probing code for the dynamically allocated outgoing arguments
514 // area assumes that the stack is probed at the top - either by the prologue
515 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
516 // most recent variable-sized object allocation. Changing the condition here
517 // may need to be followed up by changes to the probe issuing logic.
518 return !MF.getFrameInfo().hasVarSizedObjects();
519}
520
524 const AArch64InstrInfo *TII =
525 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
526 const AArch64TargetLowering *TLI =
527 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
528 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
529 DebugLoc DL = I->getDebugLoc();
530 unsigned Opc = I->getOpcode();
531 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
532 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
533
534 if (!hasReservedCallFrame(MF)) {
535 int64_t Amount = I->getOperand(0).getImm();
536 Amount = alignTo(Amount, getStackAlign());
537 if (!IsDestroy)
538 Amount = -Amount;
539
540 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
541 // doesn't have to pop anything), then the first operand will be zero too so
542 // this adjustment is a no-op.
543 if (CalleePopAmount == 0) {
544 // FIXME: in-function stack adjustment for calls is limited to 24-bits
545 // because there's no guaranteed temporary register available.
546 //
547 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
548 // 1) For offset <= 12-bit, we use LSL #0
549 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
550 // LSL #0, and the other uses LSL #12.
551 //
552 // Most call frames will be allocated at the start of a function so
553 // this is OK, but it is a limitation that needs dealing with.
554 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
555
556 if (TLI->hasInlineStackProbe(MF) &&
558 // When stack probing is enabled, the decrement of SP may need to be
559 // probed. We only need to do this if the call site needs 1024 bytes of
560 // space or more, because a region smaller than that is allowed to be
561 // unprobed at an ABI boundary. We rely on the fact that SP has been
562 // probed exactly at this point, either by the prologue or most recent
563 // dynamic allocation.
565 "non-reserved call frame without var sized objects?");
566 Register ScratchReg =
567 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
568 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
569 } else {
570 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
571 StackOffset::getFixed(Amount), TII);
572 }
573 }
574 } else if (CalleePopAmount != 0) {
575 // If the calling convention demands that the callee pops arguments from the
576 // stack, we want to add it back if we have a reserved call frame.
577 assert(CalleePopAmount < 0xffffff && "call frame too large");
578 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
579 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
580 }
581 return MBB.erase(I);
582}
583
584void AArch64FrameLowering::emitCalleeSavedGPRLocations(
587 MachineFrameInfo &MFI = MF.getFrameInfo();
589 SMEAttrs Attrs(MF.getFunction());
590 bool LocallyStreaming =
591 Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface();
592
593 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
594 if (CSI.empty())
595 return;
596
597 const TargetSubtargetInfo &STI = MF.getSubtarget();
598 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
599 const TargetInstrInfo &TII = *STI.getInstrInfo();
601
602 for (const auto &Info : CSI) {
603 unsigned FrameIdx = Info.getFrameIdx();
604 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector)
605 continue;
606
607 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
608 int64_t DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
609 int64_t Offset = MFI.getObjectOffset(FrameIdx) - getOffsetOfLocalArea();
610
611 // The location of VG will be emitted before each streaming-mode change in
612 // the function. Only locally-streaming functions require emitting the
613 // non-streaming VG location here.
614 if ((LocallyStreaming && FrameIdx == AFI->getStreamingVGIdx()) ||
615 (!LocallyStreaming &&
616 DwarfReg == TRI.getDwarfRegNum(AArch64::VG, true)))
617 continue;
618
619 unsigned CFIIndex = MF.addFrameInst(
620 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
621 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
622 .addCFIIndex(CFIIndex)
624 }
625}
626
627void AArch64FrameLowering::emitCalleeSavedSVELocations(
630 MachineFrameInfo &MFI = MF.getFrameInfo();
631
632 // Add callee saved registers to move list.
633 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
634 if (CSI.empty())
635 return;
636
637 const TargetSubtargetInfo &STI = MF.getSubtarget();
638 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
639 const TargetInstrInfo &TII = *STI.getInstrInfo();
642
643 for (const auto &Info : CSI) {
644 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
645 continue;
646
647 // Not all unwinders may know about SVE registers, so assume the lowest
648 // common demoninator.
649 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
650 unsigned Reg = Info.getReg();
651 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
652 continue;
653
655 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
657
658 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
659 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
660 .addCFIIndex(CFIIndex)
662 }
663}
664
668 unsigned DwarfReg) {
669 unsigned CFIIndex =
670 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
671 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
672}
673
675 MachineBasicBlock &MBB) const {
676
678 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
679 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
680 const auto &TRI =
681 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
682 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
683
684 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
685 DebugLoc DL;
686
687 // Reset the CFA to `SP + 0`.
689 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
690 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
691 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
692
693 // Flip the RA sign state.
694 if (MFI.shouldSignReturnAddress(MF)) {
696 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
697 }
698
699 // Shadow call stack uses X18, reset it.
700 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
701 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
702 TRI.getDwarfRegNum(AArch64::X18, true));
703
704 // Emit .cfi_same_value for callee-saved registers.
705 const std::vector<CalleeSavedInfo> &CSI =
707 for (const auto &Info : CSI) {
708 unsigned Reg = Info.getReg();
709 if (!TRI.regNeedsCFI(Reg, Reg))
710 continue;
711 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
712 TRI.getDwarfRegNum(Reg, true));
713 }
714}
715
718 bool SVE) {
720 MachineFrameInfo &MFI = MF.getFrameInfo();
721
722 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
723 if (CSI.empty())
724 return;
725
726 const TargetSubtargetInfo &STI = MF.getSubtarget();
727 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
728 const TargetInstrInfo &TII = *STI.getInstrInfo();
730
731 for (const auto &Info : CSI) {
732 if (SVE !=
733 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
734 continue;
735
736 unsigned Reg = Info.getReg();
737 if (SVE &&
738 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
739 continue;
740
741 if (!Info.isRestored())
742 continue;
743
744 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
745 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
746 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
747 .addCFIIndex(CFIIndex)
749 }
750}
751
752void AArch64FrameLowering::emitCalleeSavedGPRRestores(
755}
756
757void AArch64FrameLowering::emitCalleeSavedSVERestores(
760}
761
762// Return the maximum possible number of bytes for `Size` due to the
763// architectural limit on the size of a SVE register.
764static int64_t upperBound(StackOffset Size) {
765 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
766 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
767}
768
769void AArch64FrameLowering::allocateStackSpace(
771 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
772 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
773 bool FollowupAllocs) const {
774
775 if (!AllocSize)
776 return;
777
778 DebugLoc DL;
780 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
781 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
783 const MachineFrameInfo &MFI = MF.getFrameInfo();
784
785 const int64_t MaxAlign = MFI.getMaxAlign().value();
786 const uint64_t AndMask = ~(MaxAlign - 1);
787
788 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
789 Register TargetReg = RealignmentPadding
791 : AArch64::SP;
792 // SUB Xd/SP, SP, AllocSize
793 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
794 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
795 EmitCFI, InitialOffset);
796
797 if (RealignmentPadding) {
798 // AND SP, X9, 0b11111...0000
799 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
800 .addReg(TargetReg, RegState::Kill)
803 AFI.setStackRealigned(true);
804
805 // No need for SEH instructions here; if we're realigning the stack,
806 // we've set a frame pointer and already finished the SEH prologue.
807 assert(!NeedsWinCFI);
808 }
809 return;
810 }
811
812 //
813 // Stack probing allocation.
814 //
815
816 // Fixed length allocation. If we don't need to re-align the stack and don't
817 // have SVE objects, we can use a more efficient sequence for stack probing.
818 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
820 assert(ScratchReg != AArch64::NoRegister);
821 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
822 .addDef(ScratchReg)
823 .addImm(AllocSize.getFixed())
824 .addImm(InitialOffset.getFixed())
825 .addImm(InitialOffset.getScalable());
826 // The fixed allocation may leave unprobed bytes at the top of the
827 // stack. If we have subsequent alocation (e.g. if we have variable-sized
828 // objects), we need to issue an extra probe, so these allocations start in
829 // a known state.
830 if (FollowupAllocs) {
831 // STR XZR, [SP]
832 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
833 .addReg(AArch64::XZR)
834 .addReg(AArch64::SP)
835 .addImm(0)
837 }
838
839 return;
840 }
841
842 // Variable length allocation.
843
844 // If the (unknown) allocation size cannot exceed the probe size, decrement
845 // the stack pointer right away.
846 int64_t ProbeSize = AFI.getStackProbeSize();
847 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
848 Register ScratchReg = RealignmentPadding
850 : AArch64::SP;
851 assert(ScratchReg != AArch64::NoRegister);
852 // SUB Xd, SP, AllocSize
853 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
854 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
855 EmitCFI, InitialOffset);
856 if (RealignmentPadding) {
857 // AND SP, Xn, 0b11111...0000
858 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
859 .addReg(ScratchReg, RegState::Kill)
862 AFI.setStackRealigned(true);
863 }
864 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
866 // STR XZR, [SP]
867 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
868 .addReg(AArch64::XZR)
869 .addReg(AArch64::SP)
870 .addImm(0)
872 }
873 return;
874 }
875
876 // Emit a variable-length allocation probing loop.
877 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
878 // each of them guaranteed to adjust the stack by less than the probe size.
880 assert(TargetReg != AArch64::NoRegister);
881 // SUB Xd, SP, AllocSize
882 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
883 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
884 EmitCFI, InitialOffset);
885 if (RealignmentPadding) {
886 // AND Xn, Xn, 0b11111...0000
887 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
888 .addReg(TargetReg, RegState::Kill)
891 }
892
893 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
894 .addReg(TargetReg);
895 if (EmitCFI) {
896 // Set the CFA register back to SP.
897 unsigned Reg =
898 Subtarget.getRegisterInfo()->getDwarfRegNum(AArch64::SP, true);
899 unsigned CFIIndex =
901 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
902 .addCFIIndex(CFIIndex)
904 }
905 if (RealignmentPadding)
906 AFI.setStackRealigned(true);
907}
908
909static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
910 switch (Reg.id()) {
911 default:
912 // The called routine is expected to preserve r19-r28
913 // r29 and r30 are used as frame pointer and link register resp.
914 return 0;
915
916 // GPRs
917#define CASE(n) \
918 case AArch64::W##n: \
919 case AArch64::X##n: \
920 return AArch64::X##n
921 CASE(0);
922 CASE(1);
923 CASE(2);
924 CASE(3);
925 CASE(4);
926 CASE(5);
927 CASE(6);
928 CASE(7);
929 CASE(8);
930 CASE(9);
931 CASE(10);
932 CASE(11);
933 CASE(12);
934 CASE(13);
935 CASE(14);
936 CASE(15);
937 CASE(16);
938 CASE(17);
939 CASE(18);
940#undef CASE
941
942 // FPRs
943#define CASE(n) \
944 case AArch64::B##n: \
945 case AArch64::H##n: \
946 case AArch64::S##n: \
947 case AArch64::D##n: \
948 case AArch64::Q##n: \
949 return HasSVE ? AArch64::Z##n : AArch64::Q##n
950 CASE(0);
951 CASE(1);
952 CASE(2);
953 CASE(3);
954 CASE(4);
955 CASE(5);
956 CASE(6);
957 CASE(7);
958 CASE(8);
959 CASE(9);
960 CASE(10);
961 CASE(11);
962 CASE(12);
963 CASE(13);
964 CASE(14);
965 CASE(15);
966 CASE(16);
967 CASE(17);
968 CASE(18);
969 CASE(19);
970 CASE(20);
971 CASE(21);
972 CASE(22);
973 CASE(23);
974 CASE(24);
975 CASE(25);
976 CASE(26);
977 CASE(27);
978 CASE(28);
979 CASE(29);
980 CASE(30);
981 CASE(31);
982#undef CASE
983 }
984}
985
986void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
987 MachineBasicBlock &MBB) const {
988 // Insertion point.
990
991 // Fake a debug loc.
992 DebugLoc DL;
993 if (MBBI != MBB.end())
994 DL = MBBI->getDebugLoc();
995
996 const MachineFunction &MF = *MBB.getParent();
999
1000 BitVector GPRsToZero(TRI.getNumRegs());
1001 BitVector FPRsToZero(TRI.getNumRegs());
1002 bool HasSVE = STI.hasSVE();
1003 for (MCRegister Reg : RegsToZero.set_bits()) {
1004 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
1005 // For GPRs, we only care to clear out the 64-bit register.
1006 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1007 GPRsToZero.set(XReg);
1008 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
1009 // For FPRs,
1010 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1011 FPRsToZero.set(XReg);
1012 }
1013 }
1014
1015 const AArch64InstrInfo &TII = *STI.getInstrInfo();
1016
1017 // Zero out GPRs.
1018 for (MCRegister Reg : GPRsToZero.set_bits())
1019 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1020
1021 // Zero out FP/vector registers.
1022 for (MCRegister Reg : FPRsToZero.set_bits())
1023 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1024
1025 if (HasSVE) {
1026 for (MCRegister PReg :
1027 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
1028 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
1029 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
1030 AArch64::P15}) {
1031 if (RegsToZero[PReg])
1032 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
1033 }
1034 }
1035}
1036
1038 const MachineBasicBlock &MBB) {
1039 const MachineFunction *MF = MBB.getParent();
1040 LiveRegs.addLiveIns(MBB);
1041 // Mark callee saved registers as used so we will not choose them.
1042 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
1043 for (unsigned i = 0; CSRegs[i]; ++i)
1044 LiveRegs.addReg(CSRegs[i]);
1045}
1046
1047// Find a scratch register that we can use at the start of the prologue to
1048// re-align the stack pointer. We avoid using callee-save registers since they
1049// may appear to be free when this is called from canUseAsPrologue (during
1050// shrink wrapping), but then no longer be free when this is called from
1051// emitPrologue.
1052//
1053// FIXME: This is a bit conservative, since in the above case we could use one
1054// of the callee-save registers as a scratch temp to re-align the stack pointer,
1055// but we would then have to make sure that we were in fact saving at least one
1056// callee-save register in the prologue, which is additional complexity that
1057// doesn't seem worth the benefit.
1059 MachineFunction *MF = MBB->getParent();
1060
1061 // If MBB is an entry block, use X9 as the scratch register
1062 // preserve_none functions may be using X9 to pass arguments,
1063 // so prefer to pick an available register below.
1064 if (&MF->front() == MBB &&
1066 return AArch64::X9;
1067
1068 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1069 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1070 LivePhysRegs LiveRegs(TRI);
1071 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1072
1073 // Prefer X9 since it was historically used for the prologue scratch reg.
1074 const MachineRegisterInfo &MRI = MF->getRegInfo();
1075 if (LiveRegs.available(MRI, AArch64::X9))
1076 return AArch64::X9;
1077
1078 for (unsigned Reg : AArch64::GPR64RegClass) {
1079 if (LiveRegs.available(MRI, Reg))
1080 return Reg;
1081 }
1082 return AArch64::NoRegister;
1083}
1084
1086 const MachineBasicBlock &MBB) const {
1087 const MachineFunction *MF = MBB.getParent();
1088 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1089 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1090 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1091 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1093
1094 if (AFI->hasSwiftAsyncContext()) {
1095 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1096 const MachineRegisterInfo &MRI = MF->getRegInfo();
1097 LivePhysRegs LiveRegs(TRI);
1098 getLiveRegsForEntryMBB(LiveRegs, MBB);
1099 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1100 // available.
1101 if (!LiveRegs.available(MRI, AArch64::X16) ||
1102 !LiveRegs.available(MRI, AArch64::X17))
1103 return false;
1104 }
1105
1106 // Certain stack probing sequences might clobber flags, then we can't use
1107 // the block as a prologue if the flags register is a live-in.
1109 MBB.isLiveIn(AArch64::NZCV))
1110 return false;
1111
1112 // Don't need a scratch register if we're not going to re-align the stack or
1113 // emit stack probes.
1114 if (!RegInfo->hasStackRealignment(*MF) && !TLI->hasInlineStackProbe(*MF))
1115 return true;
1116 // Otherwise, we can use any block as long as it has a scratch register
1117 // available.
1118 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
1119}
1120
1122 uint64_t StackSizeInBytes) {
1123 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1125 // TODO: When implementing stack protectors, take that into account
1126 // for the probe threshold.
1127 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
1128 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
1129}
1130
1131static bool needsWinCFI(const MachineFunction &MF) {
1132 const Function &F = MF.getFunction();
1133 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1134 F.needsUnwindTableEntry();
1135}
1136
1137bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1138 MachineFunction &MF, uint64_t StackBumpBytes) const {
1140 const MachineFrameInfo &MFI = MF.getFrameInfo();
1141 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1142 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1143 if (homogeneousPrologEpilog(MF))
1144 return false;
1145
1146 if (AFI->getLocalStackSize() == 0)
1147 return false;
1148
1149 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1150 // (to force a stp with predecrement) to match the packed unwind format,
1151 // provided that there actually are any callee saved registers to merge the
1152 // decrement with.
1153 // This is potentially marginally slower, but allows using the packed
1154 // unwind format for functions that both have a local area and callee saved
1155 // registers. Using the packed unwind format notably reduces the size of
1156 // the unwind info.
1157 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1158 MF.getFunction().hasOptSize())
1159 return false;
1160
1161 // 512 is the maximum immediate for stp/ldp that will be used for
1162 // callee-save save/restores
1163 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1164 return false;
1165
1166 if (MFI.hasVarSizedObjects())
1167 return false;
1168
1169 if (RegInfo->hasStackRealignment(MF))
1170 return false;
1171
1172 // This isn't strictly necessary, but it simplifies things a bit since the
1173 // current RedZone handling code assumes the SP is adjusted by the
1174 // callee-save save/restore code.
1175 if (canUseRedZone(MF))
1176 return false;
1177
1178 // When there is an SVE area on the stack, always allocate the
1179 // callee-saves and spills/locals separately.
1180 if (getSVEStackSize(MF))
1181 return false;
1182
1183 return true;
1184}
1185
1186bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1187 MachineBasicBlock &MBB, unsigned StackBumpBytes) const {
1188 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1189 return false;
1190
1191 if (MBB.empty())
1192 return true;
1193
1194 // Disable combined SP bump if the last instruction is an MTE tag store. It
1195 // is almost always better to merge SP adjustment into those instructions.
1198 while (LastI != Begin) {
1199 --LastI;
1200 if (LastI->isTransient())
1201 continue;
1202 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1203 break;
1204 }
1205 switch (LastI->getOpcode()) {
1206 case AArch64::STGloop:
1207 case AArch64::STZGloop:
1208 case AArch64::STGi:
1209 case AArch64::STZGi:
1210 case AArch64::ST2Gi:
1211 case AArch64::STZ2Gi:
1212 return false;
1213 default:
1214 return true;
1215 }
1216 llvm_unreachable("unreachable");
1217}
1218
1219// Given a load or a store instruction, generate an appropriate unwinding SEH
1220// code on Windows.
1222 const TargetInstrInfo &TII,
1223 MachineInstr::MIFlag Flag) {
1224 unsigned Opc = MBBI->getOpcode();
1226 MachineFunction &MF = *MBB->getParent();
1227 DebugLoc DL = MBBI->getDebugLoc();
1228 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1229 int Imm = MBBI->getOperand(ImmIdx).getImm();
1231 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1232 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1233
1234 switch (Opc) {
1235 default:
1236 llvm_unreachable("No SEH Opcode for this instruction");
1237 case AArch64::LDPDpost:
1238 Imm = -Imm;
1239 [[fallthrough]];
1240 case AArch64::STPDpre: {
1241 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1242 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1243 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1244 .addImm(Reg0)
1245 .addImm(Reg1)
1246 .addImm(Imm * 8)
1247 .setMIFlag(Flag);
1248 break;
1249 }
1250 case AArch64::LDPXpost:
1251 Imm = -Imm;
1252 [[fallthrough]];
1253 case AArch64::STPXpre: {
1254 Register Reg0 = MBBI->getOperand(1).getReg();
1255 Register Reg1 = MBBI->getOperand(2).getReg();
1256 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1257 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1258 .addImm(Imm * 8)
1259 .setMIFlag(Flag);
1260 else
1261 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1262 .addImm(RegInfo->getSEHRegNum(Reg0))
1263 .addImm(RegInfo->getSEHRegNum(Reg1))
1264 .addImm(Imm * 8)
1265 .setMIFlag(Flag);
1266 break;
1267 }
1268 case AArch64::LDRDpost:
1269 Imm = -Imm;
1270 [[fallthrough]];
1271 case AArch64::STRDpre: {
1272 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1273 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1274 .addImm(Reg)
1275 .addImm(Imm)
1276 .setMIFlag(Flag);
1277 break;
1278 }
1279 case AArch64::LDRXpost:
1280 Imm = -Imm;
1281 [[fallthrough]];
1282 case AArch64::STRXpre: {
1283 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1284 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1285 .addImm(Reg)
1286 .addImm(Imm)
1287 .setMIFlag(Flag);
1288 break;
1289 }
1290 case AArch64::STPDi:
1291 case AArch64::LDPDi: {
1292 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1293 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1294 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1295 .addImm(Reg0)
1296 .addImm(Reg1)
1297 .addImm(Imm * 8)
1298 .setMIFlag(Flag);
1299 break;
1300 }
1301 case AArch64::STPXi:
1302 case AArch64::LDPXi: {
1303 Register Reg0 = MBBI->getOperand(0).getReg();
1304 Register Reg1 = MBBI->getOperand(1).getReg();
1305 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1306 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1307 .addImm(Imm * 8)
1308 .setMIFlag(Flag);
1309 else
1310 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1311 .addImm(RegInfo->getSEHRegNum(Reg0))
1312 .addImm(RegInfo->getSEHRegNum(Reg1))
1313 .addImm(Imm * 8)
1314 .setMIFlag(Flag);
1315 break;
1316 }
1317 case AArch64::STRXui:
1318 case AArch64::LDRXui: {
1319 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1320 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1321 .addImm(Reg)
1322 .addImm(Imm * 8)
1323 .setMIFlag(Flag);
1324 break;
1325 }
1326 case AArch64::STRDui:
1327 case AArch64::LDRDui: {
1328 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1329 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1330 .addImm(Reg)
1331 .addImm(Imm * 8)
1332 .setMIFlag(Flag);
1333 break;
1334 }
1335 case AArch64::STPQi:
1336 case AArch64::LDPQi: {
1337 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1338 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1339 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1340 .addImm(Reg0)
1341 .addImm(Reg1)
1342 .addImm(Imm * 16)
1343 .setMIFlag(Flag);
1344 break;
1345 }
1346 case AArch64::LDPQpost:
1347 Imm = -Imm;
1348 [[fallthrough]];
1349 case AArch64::STPQpre: {
1350 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1351 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1352 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1353 .addImm(Reg0)
1354 .addImm(Reg1)
1355 .addImm(Imm * 16)
1356 .setMIFlag(Flag);
1357 break;
1358 }
1359 }
1360 auto I = MBB->insertAfter(MBBI, MIB);
1361 return I;
1362}
1363
1364// Fix up the SEH opcode associated with the save/restore instruction.
1366 unsigned LocalStackSize) {
1367 MachineOperand *ImmOpnd = nullptr;
1368 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1369 switch (MBBI->getOpcode()) {
1370 default:
1371 llvm_unreachable("Fix the offset in the SEH instruction");
1372 case AArch64::SEH_SaveFPLR:
1373 case AArch64::SEH_SaveRegP:
1374 case AArch64::SEH_SaveReg:
1375 case AArch64::SEH_SaveFRegP:
1376 case AArch64::SEH_SaveFReg:
1377 case AArch64::SEH_SaveAnyRegQP:
1378 case AArch64::SEH_SaveAnyRegQPX:
1379 ImmOpnd = &MBBI->getOperand(ImmIdx);
1380 break;
1381 }
1382 if (ImmOpnd)
1383 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1384}
1385
1388 return AFI->hasStreamingModeChanges() &&
1389 !MF.getSubtarget<AArch64Subtarget>().hasSVE();
1390}
1391
1393 unsigned Opc = MBBI->getOpcode();
1394 if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
1395 Opc == AArch64::UBFMXri)
1396 return true;
1397
1398 if (requiresGetVGCall(*MBBI->getMF())) {
1399 if (Opc == AArch64::ORRXrr)
1400 return true;
1401
1402 if (Opc == AArch64::BL) {
1403 auto Op1 = MBBI->getOperand(0);
1404 return Op1.isSymbol() &&
1405 (StringRef(Op1.getSymbolName()) == "__arm_get_current_vg");
1406 }
1407 }
1408
1409 return false;
1410}
1411
1412// Convert callee-save register save/restore instruction to do stack pointer
1413// decrement/increment to allocate/deallocate the callee-save stack area by
1414// converting store/load to use pre/post increment version.
1417 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1418 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1420 int CFAOffset = 0) {
1421 unsigned NewOpc;
1422
1423 // If the function contains streaming mode changes, we expect instructions
1424 // to calculate the value of VG before spilling. For locally-streaming
1425 // functions, we need to do this for both the streaming and non-streaming
1426 // vector length. Move past these instructions if necessary.
1427 MachineFunction &MF = *MBB.getParent();
1429 if (AFI->hasStreamingModeChanges())
1430 while (isVGInstruction(MBBI))
1431 ++MBBI;
1432
1433 switch (MBBI->getOpcode()) {
1434 default:
1435 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1436 case AArch64::STPXi:
1437 NewOpc = AArch64::STPXpre;
1438 break;
1439 case AArch64::STPDi:
1440 NewOpc = AArch64::STPDpre;
1441 break;
1442 case AArch64::STPQi:
1443 NewOpc = AArch64::STPQpre;
1444 break;
1445 case AArch64::STRXui:
1446 NewOpc = AArch64::STRXpre;
1447 break;
1448 case AArch64::STRDui:
1449 NewOpc = AArch64::STRDpre;
1450 break;
1451 case AArch64::STRQui:
1452 NewOpc = AArch64::STRQpre;
1453 break;
1454 case AArch64::LDPXi:
1455 NewOpc = AArch64::LDPXpost;
1456 break;
1457 case AArch64::LDPDi:
1458 NewOpc = AArch64::LDPDpost;
1459 break;
1460 case AArch64::LDPQi:
1461 NewOpc = AArch64::LDPQpost;
1462 break;
1463 case AArch64::LDRXui:
1464 NewOpc = AArch64::LDRXpost;
1465 break;
1466 case AArch64::LDRDui:
1467 NewOpc = AArch64::LDRDpost;
1468 break;
1469 case AArch64::LDRQui:
1470 NewOpc = AArch64::LDRQpost;
1471 break;
1472 }
1473 // Get rid of the SEH code associated with the old instruction.
1474 if (NeedsWinCFI) {
1475 auto SEH = std::next(MBBI);
1477 SEH->eraseFromParent();
1478 }
1479
1480 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1481 int64_t MinOffset, MaxOffset;
1482 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1483 NewOpc, Scale, Width, MinOffset, MaxOffset);
1484 (void)Success;
1485 assert(Success && "unknown load/store opcode");
1486
1487 // If the first store isn't right where we want SP then we can't fold the
1488 // update in so create a normal arithmetic instruction instead.
1489 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1490 CSStackSizeInc < MinOffset || CSStackSizeInc > MaxOffset) {
1491 // If we are destroying the frame, make sure we add the increment after the
1492 // last frame operation.
1493 if (FrameFlag == MachineInstr::FrameDestroy)
1494 ++MBBI;
1495 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1496 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1497 false, false, nullptr, EmitCFI,
1498 StackOffset::getFixed(CFAOffset));
1499
1500 return std::prev(MBBI);
1501 }
1502
1503 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1504 MIB.addReg(AArch64::SP, RegState::Define);
1505
1506 // Copy all operands other than the immediate offset.
1507 unsigned OpndIdx = 0;
1508 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1509 ++OpndIdx)
1510 MIB.add(MBBI->getOperand(OpndIdx));
1511
1512 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1513 "Unexpected immediate offset in first/last callee-save save/restore "
1514 "instruction!");
1515 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1516 "Unexpected base register in callee-save save/restore instruction!");
1517 assert(CSStackSizeInc % Scale == 0);
1518 MIB.addImm(CSStackSizeInc / (int)Scale);
1519
1520 MIB.setMIFlags(MBBI->getFlags());
1521 MIB.setMemRefs(MBBI->memoperands());
1522
1523 // Generate a new SEH code that corresponds to the new instruction.
1524 if (NeedsWinCFI) {
1525 *HasWinCFI = true;
1526 InsertSEH(*MIB, *TII, FrameFlag);
1527 }
1528
1529 if (EmitCFI) {
1530 unsigned CFIIndex = MF.addFrameInst(
1531 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1532 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1533 .addCFIIndex(CFIIndex)
1534 .setMIFlags(FrameFlag);
1535 }
1536
1537 return std::prev(MBB.erase(MBBI));
1538}
1539
1540// Fixup callee-save register save/restore instructions to take into account
1541// combined SP bump by adding the local stack size to the stack offsets.
1543 uint64_t LocalStackSize,
1544 bool NeedsWinCFI,
1545 bool *HasWinCFI) {
1547 return;
1548
1549 unsigned Opc = MI.getOpcode();
1550 unsigned Scale;
1551 switch (Opc) {
1552 case AArch64::STPXi:
1553 case AArch64::STRXui:
1554 case AArch64::STPDi:
1555 case AArch64::STRDui:
1556 case AArch64::LDPXi:
1557 case AArch64::LDRXui:
1558 case AArch64::LDPDi:
1559 case AArch64::LDRDui:
1560 Scale = 8;
1561 break;
1562 case AArch64::STPQi:
1563 case AArch64::STRQui:
1564 case AArch64::LDPQi:
1565 case AArch64::LDRQui:
1566 Scale = 16;
1567 break;
1568 default:
1569 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1570 }
1571
1572 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1573 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1574 "Unexpected base register in callee-save save/restore instruction!");
1575 // Last operand is immediate offset that needs fixing.
1576 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1577 // All generated opcodes have scaled offsets.
1578 assert(LocalStackSize % Scale == 0);
1579 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1580
1581 if (NeedsWinCFI) {
1582 *HasWinCFI = true;
1583 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1584 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1586 "Expecting a SEH instruction");
1587 fixupSEHOpcode(MBBI, LocalStackSize);
1588 }
1589}
1590
1591static bool isTargetWindows(const MachineFunction &MF) {
1593}
1594
1595// Convenience function to determine whether I is an SVE callee save.
1597 switch (I->getOpcode()) {
1598 default:
1599 return false;
1600 case AArch64::PTRUE_C_B:
1601 case AArch64::LD1B_2Z_IMM:
1602 case AArch64::ST1B_2Z_IMM:
1603 case AArch64::STR_ZXI:
1604 case AArch64::STR_PXI:
1605 case AArch64::LDR_ZXI:
1606 case AArch64::LDR_PXI:
1607 return I->getFlag(MachineInstr::FrameSetup) ||
1608 I->getFlag(MachineInstr::FrameDestroy);
1609 }
1610}
1611
1613 MachineFunction &MF,
1616 const DebugLoc &DL, bool NeedsWinCFI,
1617 bool NeedsUnwindInfo) {
1618 // Shadow call stack prolog: str x30, [x18], #8
1619 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1620 .addReg(AArch64::X18, RegState::Define)
1621 .addReg(AArch64::LR)
1622 .addReg(AArch64::X18)
1623 .addImm(8)
1625
1626 // This instruction also makes x18 live-in to the entry block.
1627 MBB.addLiveIn(AArch64::X18);
1628
1629 if (NeedsWinCFI)
1630 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1632
1633 if (NeedsUnwindInfo) {
1634 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1635 // x18 when unwinding past this frame.
1636 static const char CFIInst[] = {
1637 dwarf::DW_CFA_val_expression,
1638 18, // register
1639 2, // length
1640 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1641 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1642 };
1643 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1644 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1645 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1646 .addCFIIndex(CFIIndex)
1648 }
1649}
1650
1652 MachineFunction &MF,
1655 const DebugLoc &DL) {
1656 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1657 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1658 .addReg(AArch64::X18, RegState::Define)
1659 .addReg(AArch64::LR, RegState::Define)
1660 .addReg(AArch64::X18)
1661 .addImm(-8)
1663
1665 unsigned CFIIndex =
1667 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1668 .addCFIIndex(CFIIndex)
1670 }
1671}
1672
1673// Define the current CFA rule to use the provided FP.
1676 const DebugLoc &DL, unsigned FixedObject) {
1679 const TargetInstrInfo *TII = STI.getInstrInfo();
1681
1682 const int OffsetToFirstCalleeSaveFromFP =
1685 Register FramePtr = TRI->getFrameRegister(MF);
1686 unsigned Reg = TRI->getDwarfRegNum(FramePtr, true);
1687 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1688 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1689 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1690 .addCFIIndex(CFIIndex)
1692}
1693
1694#ifndef NDEBUG
1695/// Collect live registers from the end of \p MI's parent up to (including) \p
1696/// MI in \p LiveRegs.
1698 LivePhysRegs &LiveRegs) {
1699
1700 MachineBasicBlock &MBB = *MI.getParent();
1701 LiveRegs.addLiveOuts(MBB);
1702 for (const MachineInstr &MI :
1703 reverse(make_range(MI.getIterator(), MBB.instr_end())))
1704 LiveRegs.stepBackward(MI);
1705}
1706#endif
1707
1709 MachineBasicBlock &MBB) const {
1711 const MachineFrameInfo &MFI = MF.getFrameInfo();
1712 const Function &F = MF.getFunction();
1713 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1714 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1715 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1716
1718 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1719 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1720 bool HasFP = hasFP(MF);
1721 bool NeedsWinCFI = needsWinCFI(MF);
1722 bool HasWinCFI = false;
1723 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1724
1726#ifndef NDEBUG
1728 // Collect live register from the end of MBB up to the start of the existing
1729 // frame setup instructions.
1730 MachineBasicBlock::iterator NonFrameStart = MBB.begin();
1731 while (NonFrameStart != End &&
1732 NonFrameStart->getFlag(MachineInstr::FrameSetup))
1733 ++NonFrameStart;
1734
1735 LivePhysRegs LiveRegs(*TRI);
1736 if (NonFrameStart != MBB.end()) {
1737 getLivePhysRegsUpTo(*NonFrameStart, *TRI, LiveRegs);
1738 // Ignore registers used for stack management for now.
1739 LiveRegs.removeReg(AArch64::SP);
1740 LiveRegs.removeReg(AArch64::X19);
1741 LiveRegs.removeReg(AArch64::FP);
1742 LiveRegs.removeReg(AArch64::LR);
1743
1744 // X0 will be clobbered by a call to __arm_get_current_vg in the prologue.
1745 // This is necessary to spill VG if required where SVE is unavailable, but
1746 // X0 is preserved around this call.
1747 if (requiresGetVGCall(MF))
1748 LiveRegs.removeReg(AArch64::X0);
1749 }
1750
1751 auto VerifyClobberOnExit = make_scope_exit([&]() {
1752 if (NonFrameStart == MBB.end())
1753 return;
1754 // Check if any of the newly instructions clobber any of the live registers.
1755 for (MachineInstr &MI :
1756 make_range(MBB.instr_begin(), NonFrameStart->getIterator())) {
1757 for (auto &Op : MI.operands())
1758 if (Op.isReg() && Op.isDef())
1759 assert(!LiveRegs.contains(Op.getReg()) &&
1760 "live register clobbered by inserted prologue instructions");
1761 }
1762 });
1763#endif
1764
1765 bool IsFunclet = MBB.isEHFuncletEntry();
1766
1767 // At this point, we're going to decide whether or not the function uses a
1768 // redzone. In most cases, the function doesn't have a redzone so let's
1769 // assume that's false and set it to true in the case that there's a redzone.
1770 AFI->setHasRedZone(false);
1771
1772 // Debug location must be unknown since the first debug location is used
1773 // to determine the end of the prologue.
1774 DebugLoc DL;
1775
1776 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1777 if (MFnI.needsShadowCallStackPrologueEpilogue(MF))
1778 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1779 MFnI.needsDwarfUnwindInfo(MF));
1780
1781 if (MFnI.shouldSignReturnAddress(MF)) {
1782 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1784 if (NeedsWinCFI)
1785 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1786 }
1787
1788 if (EmitCFI && MFnI.isMTETagged()) {
1789 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1791 }
1792
1793 // We signal the presence of a Swift extended frame to external tools by
1794 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1795 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1796 // bits so that is still true.
1797 if (HasFP && AFI->hasSwiftAsyncContext()) {
1800 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1801 // The special symbol below is absolute and has a *value* that can be
1802 // combined with the frame pointer to signal an extended frame.
1803 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1804 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1806 if (NeedsWinCFI) {
1807 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1809 HasWinCFI = true;
1810 }
1811 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1812 .addUse(AArch64::FP)
1813 .addUse(AArch64::X16)
1814 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1815 if (NeedsWinCFI) {
1816 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1818 HasWinCFI = true;
1819 }
1820 break;
1821 }
1822 [[fallthrough]];
1823
1825 // ORR x29, x29, #0x1000_0000_0000_0000
1826 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1827 .addUse(AArch64::FP)
1828 .addImm(0x1100)
1830 if (NeedsWinCFI) {
1831 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1833 HasWinCFI = true;
1834 }
1835 break;
1836
1838 break;
1839 }
1840 }
1841
1842 // All calls are tail calls in GHC calling conv, and functions have no
1843 // prologue/epilogue.
1845 return;
1846
1847 // Set tagged base pointer to the requested stack slot.
1848 // Ideally it should match SP value after prologue.
1849 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1850 if (TBPI)
1852 else
1854
1855 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1856
1857 // getStackSize() includes all the locals in its size calculation. We don't
1858 // include these locals when computing the stack size of a funclet, as they
1859 // are allocated in the parent's stack frame and accessed via the frame
1860 // pointer from the funclet. We only save the callee saved registers in the
1861 // funclet, which are really the callee saved registers of the parent
1862 // function, including the funclet.
1863 int64_t NumBytes =
1864 IsFunclet ? getWinEHFuncletFrameSize(MF) : MFI.getStackSize();
1865 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1866 assert(!HasFP && "unexpected function without stack frame but with FP");
1867 assert(!SVEStackSize &&
1868 "unexpected function without stack frame but with SVE objects");
1869 // All of the stack allocation is for locals.
1870 AFI->setLocalStackSize(NumBytes);
1871 if (!NumBytes)
1872 return;
1873 // REDZONE: If the stack size is less than 128 bytes, we don't need
1874 // to actually allocate.
1875 if (canUseRedZone(MF)) {
1876 AFI->setHasRedZone(true);
1877 ++NumRedZoneFunctions;
1878 } else {
1879 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1880 StackOffset::getFixed(-NumBytes), TII,
1881 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1882 if (EmitCFI) {
1883 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1884 MCSymbol *FrameLabel = MF.getContext().createTempSymbol();
1885 // Encode the stack size of the leaf function.
1886 unsigned CFIIndex = MF.addFrameInst(
1887 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1888 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1889 .addCFIIndex(CFIIndex)
1891 }
1892 }
1893
1894 if (NeedsWinCFI) {
1895 HasWinCFI = true;
1896 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1898 }
1899
1900 return;
1901 }
1902
1903 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1904 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1905
1906 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1907 // All of the remaining stack allocations are for locals.
1908 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1909 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1910 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1911 if (CombineSPBump) {
1912 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1913 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1914 StackOffset::getFixed(-NumBytes), TII,
1915 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1916 EmitAsyncCFI);
1917 NumBytes = 0;
1918 } else if (HomPrologEpilog) {
1919 // Stack has been already adjusted.
1920 NumBytes -= PrologueSaveSize;
1921 } else if (PrologueSaveSize != 0) {
1923 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1924 EmitAsyncCFI);
1925 NumBytes -= PrologueSaveSize;
1926 }
1927 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1928
1929 // Move past the saves of the callee-saved registers, fixing up the offsets
1930 // and pre-inc if we decided to combine the callee-save and local stack
1931 // pointer bump above.
1932 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1934 // Move past instructions generated to calculate VG
1935 if (AFI->hasStreamingModeChanges())
1936 while (isVGInstruction(MBBI))
1937 ++MBBI;
1938
1939 if (CombineSPBump)
1941 NeedsWinCFI, &HasWinCFI);
1942 ++MBBI;
1943 }
1944
1945 // For funclets the FP belongs to the containing function.
1946 if (!IsFunclet && HasFP) {
1947 // Only set up FP if we actually need to.
1948 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1949
1950 if (CombineSPBump)
1951 FPOffset += AFI->getLocalStackSize();
1952
1953 if (AFI->hasSwiftAsyncContext()) {
1954 // Before we update the live FP we have to ensure there's a valid (or
1955 // null) asynchronous context in its slot just before FP in the frame
1956 // record, so store it now.
1957 const auto &Attrs = MF.getFunction().getAttributes();
1958 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1959 if (HaveInitialContext)
1960 MBB.addLiveIn(AArch64::X22);
1961 Register Reg = HaveInitialContext ? AArch64::X22 : AArch64::XZR;
1962 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1963 .addUse(Reg)
1964 .addUse(AArch64::SP)
1965 .addImm(FPOffset - 8)
1967 if (NeedsWinCFI) {
1968 // WinCFI and arm64e, where StoreSwiftAsyncContext is expanded
1969 // to multiple instructions, should be mutually-exclusive.
1970 assert(Subtarget.getTargetTriple().getArchName() != "arm64e");
1971 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1973 HasWinCFI = true;
1974 }
1975 }
1976
1977 if (HomPrologEpilog) {
1978 auto Prolog = MBBI;
1979 --Prolog;
1980 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
1981 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
1982 } else {
1983 // Issue sub fp, sp, FPOffset or
1984 // mov fp,sp when FPOffset is zero.
1985 // Note: All stores of callee-saved registers are marked as "FrameSetup".
1986 // This code marks the instruction(s) that set the FP also.
1987 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
1988 StackOffset::getFixed(FPOffset), TII,
1989 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1990 if (NeedsWinCFI && HasWinCFI) {
1991 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1993 // After setting up the FP, the rest of the prolog doesn't need to be
1994 // included in the SEH unwind info.
1995 NeedsWinCFI = false;
1996 }
1997 }
1998 if (EmitAsyncCFI)
1999 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2000 }
2001
2002 // Now emit the moves for whatever callee saved regs we have (including FP,
2003 // LR if those are saved). Frame instructions for SVE register are emitted
2004 // later, after the instruction which actually save SVE regs.
2005 if (EmitAsyncCFI)
2006 emitCalleeSavedGPRLocations(MBB, MBBI);
2007
2008 // Alignment is required for the parent frame, not the funclet
2009 const bool NeedsRealignment =
2010 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
2011 const int64_t RealignmentPadding =
2012 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
2013 ? MFI.getMaxAlign().value() - 16
2014 : 0;
2015
2016 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
2017 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
2018 if (NeedsWinCFI) {
2019 HasWinCFI = true;
2020 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
2021 // exceed this amount. We need to move at most 2^24 - 1 into x15.
2022 // This is at most two instructions, MOVZ follwed by MOVK.
2023 // TODO: Fix to use multiple stack alloc unwind codes for stacks
2024 // exceeding 256MB in size.
2025 if (NumBytes >= (1 << 28))
2026 report_fatal_error("Stack size cannot exceed 256MB for stack "
2027 "unwinding purposes");
2028
2029 uint32_t LowNumWords = NumWords & 0xFFFF;
2030 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
2031 .addImm(LowNumWords)
2034 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2036 if ((NumWords & 0xFFFF0000) != 0) {
2037 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
2038 .addReg(AArch64::X15)
2039 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
2042 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2044 }
2045 } else {
2046 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
2047 .addImm(NumWords)
2049 }
2050
2051 const char *ChkStk = Subtarget.getChkStkName();
2052 switch (MF.getTarget().getCodeModel()) {
2053 case CodeModel::Tiny:
2054 case CodeModel::Small:
2055 case CodeModel::Medium:
2056 case CodeModel::Kernel:
2057 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
2058 .addExternalSymbol(ChkStk)
2059 .addReg(AArch64::X15, RegState::Implicit)
2064 if (NeedsWinCFI) {
2065 HasWinCFI = true;
2066 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2068 }
2069 break;
2070 case CodeModel::Large:
2071 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
2072 .addReg(AArch64::X16, RegState::Define)
2073 .addExternalSymbol(ChkStk)
2074 .addExternalSymbol(ChkStk)
2076 if (NeedsWinCFI) {
2077 HasWinCFI = true;
2078 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2080 }
2081
2082 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
2083 .addReg(AArch64::X16, RegState::Kill)
2089 if (NeedsWinCFI) {
2090 HasWinCFI = true;
2091 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2093 }
2094 break;
2095 }
2096
2097 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
2098 .addReg(AArch64::SP, RegState::Kill)
2099 .addReg(AArch64::X15, RegState::Kill)
2102 if (NeedsWinCFI) {
2103 HasWinCFI = true;
2104 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
2105 .addImm(NumBytes)
2107 }
2108 NumBytes = 0;
2109
2110 if (RealignmentPadding > 0) {
2111 if (RealignmentPadding >= 4096) {
2112 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
2113 .addReg(AArch64::X16, RegState::Define)
2114 .addImm(RealignmentPadding)
2116 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
2117 .addReg(AArch64::SP)
2118 .addReg(AArch64::X16, RegState::Kill)
2121 } else {
2122 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
2123 .addReg(AArch64::SP)
2124 .addImm(RealignmentPadding)
2125 .addImm(0)
2127 }
2128
2129 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
2130 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
2131 .addReg(AArch64::X15, RegState::Kill)
2133 AFI->setStackRealigned(true);
2134
2135 // No need for SEH instructions here; if we're realigning the stack,
2136 // we've set a frame pointer and already finished the SEH prologue.
2137 assert(!NeedsWinCFI);
2138 }
2139 }
2140
2141 StackOffset SVECalleeSavesSize = {}, SVELocalsSize = SVEStackSize;
2142 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
2143
2144 // Process the SVE callee-saves to determine what space needs to be
2145 // allocated.
2146 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2147 LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize
2148 << "\n");
2149 // Find callee save instructions in frame.
2150 CalleeSavesBegin = MBBI;
2151 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
2153 ++MBBI;
2154 CalleeSavesEnd = MBBI;
2155
2156 SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
2157 SVELocalsSize = SVEStackSize - SVECalleeSavesSize;
2158 }
2159
2160 // Allocate space for the callee saves (if any).
2161 StackOffset CFAOffset =
2162 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
2163 StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
2164 allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
2165 nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
2166 MFI.hasVarSizedObjects() || LocalsSize);
2167 CFAOffset += SVECalleeSavesSize;
2168
2169 if (EmitAsyncCFI)
2170 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
2171
2172 // Allocate space for the rest of the frame including SVE locals. Align the
2173 // stack as necessary.
2174 assert(!(canUseRedZone(MF) && NeedsRealignment) &&
2175 "Cannot use redzone with stack realignment");
2176 if (!canUseRedZone(MF)) {
2177 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
2178 // the correct value here, as NumBytes also includes padding bytes,
2179 // which shouldn't be counted here.
2180 allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
2181 SVELocalsSize + StackOffset::getFixed(NumBytes),
2182 NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
2183 CFAOffset, MFI.hasVarSizedObjects());
2184 }
2185
2186 // If we need a base pointer, set it up here. It's whatever the value of the
2187 // stack pointer is at this point. Any variable size objects will be allocated
2188 // after this, so we can still use the base pointer to reference locals.
2189 //
2190 // FIXME: Clarify FrameSetup flags here.
2191 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
2192 // needed.
2193 // For funclets the BP belongs to the containing function.
2194 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
2195 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
2196 false);
2197 if (NeedsWinCFI) {
2198 HasWinCFI = true;
2199 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2201 }
2202 }
2203
2204 // The very last FrameSetup instruction indicates the end of prologue. Emit a
2205 // SEH opcode indicating the prologue end.
2206 if (NeedsWinCFI && HasWinCFI) {
2207 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2209 }
2210
2211 // SEH funclets are passed the frame pointer in X1. If the parent
2212 // function uses the base register, then the base register is used
2213 // directly, and is not retrieved from X1.
2214 if (IsFunclet && F.hasPersonalityFn()) {
2215 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
2216 if (isAsynchronousEHPersonality(Per)) {
2217 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
2218 .addReg(AArch64::X1)
2220 MBB.addLiveIn(AArch64::X1);
2221 }
2222 }
2223
2224 if (EmitCFI && !EmitAsyncCFI) {
2225 if (HasFP) {
2226 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2227 } else {
2228 StackOffset TotalSize =
2229 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
2230 unsigned CFIIndex = MF.addFrameInst(createDefCFA(
2231 *RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP, TotalSize,
2232 /*LastAdjustmentWasScalable=*/false));
2233 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2234 .addCFIIndex(CFIIndex)
2236 }
2237 emitCalleeSavedGPRLocations(MBB, MBBI);
2238 emitCalleeSavedSVELocations(MBB, MBBI);
2239 }
2240}
2241
2243 switch (MI.getOpcode()) {
2244 default:
2245 return false;
2246 case AArch64::CATCHRET:
2247 case AArch64::CLEANUPRET:
2248 return true;
2249 }
2250}
2251
2253 MachineBasicBlock &MBB) const {
2255 MachineFrameInfo &MFI = MF.getFrameInfo();
2257 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2258 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
2259 DebugLoc DL;
2260 bool NeedsWinCFI = needsWinCFI(MF);
2261 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
2262 bool HasWinCFI = false;
2263 bool IsFunclet = false;
2264
2265 if (MBB.end() != MBBI) {
2266 DL = MBBI->getDebugLoc();
2267 IsFunclet = isFuncletReturnInstr(*MBBI);
2268 }
2269
2270 MachineBasicBlock::iterator EpilogStartI = MBB.end();
2271
2272 auto FinishingTouches = make_scope_exit([&]() {
2273 if (AFI->shouldSignReturnAddress(MF)) {
2274 BuildMI(MBB, MBB.getFirstTerminator(), DL,
2275 TII->get(AArch64::PAUTH_EPILOGUE))
2276 .setMIFlag(MachineInstr::FrameDestroy);
2277 if (NeedsWinCFI)
2278 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
2279 }
2282 if (EmitCFI)
2283 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
2284 if (HasWinCFI) {
2286 TII->get(AArch64::SEH_EpilogEnd))
2288 if (!MF.hasWinCFI())
2289 MF.setHasWinCFI(true);
2290 }
2291 if (NeedsWinCFI) {
2292 assert(EpilogStartI != MBB.end());
2293 if (!HasWinCFI)
2294 MBB.erase(EpilogStartI);
2295 }
2296 });
2297
2298 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
2299 : MFI.getStackSize();
2300
2301 // All calls are tail calls in GHC calling conv, and functions have no
2302 // prologue/epilogue.
2304 return;
2305
2306 // How much of the stack used by incoming arguments this function is expected
2307 // to restore in this particular epilogue.
2308 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
2309 bool IsWin64 = Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2310 MF.getFunction().isVarArg());
2311 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2312
2313 int64_t AfterCSRPopSize = ArgumentStackToRestore;
2314 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2315 // We cannot rely on the local stack size set in emitPrologue if the function
2316 // has funclets, as funclets have different local stack size requirements, and
2317 // the current value set in emitPrologue may be that of the containing
2318 // function.
2319 if (MF.hasEHFunclets())
2320 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2321 if (homogeneousPrologEpilog(MF, &MBB)) {
2322 assert(!NeedsWinCFI);
2323 auto LastPopI = MBB.getFirstTerminator();
2324 if (LastPopI != MBB.begin()) {
2325 auto HomogeneousEpilog = std::prev(LastPopI);
2326 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
2327 LastPopI = HomogeneousEpilog;
2328 }
2329
2330 // Adjust local stack
2331 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2333 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2334
2335 // SP has been already adjusted while restoring callee save regs.
2336 // We've bailed-out the case with adjusting SP for arguments.
2337 assert(AfterCSRPopSize == 0);
2338 return;
2339 }
2340 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
2341 // Assume we can't combine the last pop with the sp restore.
2342
2343 bool CombineAfterCSRBump = false;
2344 if (!CombineSPBump && PrologueSaveSize != 0) {
2346 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2348 Pop = std::prev(Pop);
2349 // Converting the last ldp to a post-index ldp is valid only if the last
2350 // ldp's offset is 0.
2351 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2352 // If the offset is 0 and the AfterCSR pop is not actually trying to
2353 // allocate more stack for arguments (in space that an untimely interrupt
2354 // may clobber), convert it to a post-index ldp.
2355 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2357 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2358 MachineInstr::FrameDestroy, PrologueSaveSize);
2359 } else {
2360 // If not, make sure to emit an add after the last ldp.
2361 // We're doing this by transfering the size to be restored from the
2362 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2363 // pops.
2364 AfterCSRPopSize += PrologueSaveSize;
2365 CombineAfterCSRBump = true;
2366 }
2367 }
2368
2369 // Move past the restores of the callee-saved registers.
2370 // If we plan on combining the sp bump of the local stack size and the callee
2371 // save stack size, we might need to adjust the CSR save and restore offsets.
2374 while (LastPopI != Begin) {
2375 --LastPopI;
2376 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2377 IsSVECalleeSave(LastPopI)) {
2378 ++LastPopI;
2379 break;
2380 } else if (CombineSPBump)
2382 NeedsWinCFI, &HasWinCFI);
2383 }
2384
2385 if (NeedsWinCFI) {
2386 // Note that there are cases where we insert SEH opcodes in the
2387 // epilogue when we had no SEH opcodes in the prologue. For
2388 // example, when there is no stack frame but there are stack
2389 // arguments. Insert the SEH_EpilogStart and remove it later if it
2390 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2391 // functions that don't need it.
2392 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2394 EpilogStartI = LastPopI;
2395 --EpilogStartI;
2396 }
2397
2398 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2401 // Avoid the reload as it is GOT relative, and instead fall back to the
2402 // hardcoded value below. This allows a mismatch between the OS and
2403 // application without immediately terminating on the difference.
2404 [[fallthrough]];
2406 // We need to reset FP to its untagged state on return. Bit 60 is
2407 // currently used to show the presence of an extended frame.
2408
2409 // BIC x29, x29, #0x1000_0000_0000_0000
2410 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2411 AArch64::FP)
2412 .addUse(AArch64::FP)
2413 .addImm(0x10fe)
2415 if (NeedsWinCFI) {
2416 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2418 HasWinCFI = true;
2419 }
2420 break;
2421
2423 break;
2424 }
2425 }
2426
2427 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2428
2429 // If there is a single SP update, insert it before the ret and we're done.
2430 if (CombineSPBump) {
2431 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2432
2433 // When we are about to restore the CSRs, the CFA register is SP again.
2434 if (EmitCFI && hasFP(MF)) {
2435 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2436 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2437 unsigned CFIIndex =
2438 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2439 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2440 .addCFIIndex(CFIIndex)
2442 }
2443
2444 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2445 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2446 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2447 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2448 return;
2449 }
2450
2451 NumBytes -= PrologueSaveSize;
2452 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2453
2454 // Process the SVE callee-saves to determine what space needs to be
2455 // deallocated.
2456 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2457 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2458 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2459 RestoreBegin = std::prev(RestoreEnd);
2460 while (RestoreBegin != MBB.begin() &&
2461 IsSVECalleeSave(std::prev(RestoreBegin)))
2462 --RestoreBegin;
2463
2464 assert(IsSVECalleeSave(RestoreBegin) &&
2465 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2466
2467 StackOffset CalleeSavedSizeAsOffset =
2468 StackOffset::getScalable(CalleeSavedSize);
2469 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2470 DeallocateAfter = CalleeSavedSizeAsOffset;
2471 }
2472
2473 // Deallocate the SVE area.
2474 if (SVEStackSize) {
2475 // If we have stack realignment or variable sized objects on the stack,
2476 // restore the stack pointer from the frame pointer prior to SVE CSR
2477 // restoration.
2478 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2479 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2480 // Set SP to start of SVE callee-save area from which they can
2481 // be reloaded. The code below will deallocate the stack space
2482 // space by moving FP -> SP.
2483 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2484 StackOffset::getScalable(-CalleeSavedSize), TII,
2486 }
2487 } else {
2488 if (AFI->getSVECalleeSavedStackSize()) {
2489 // Deallocate the non-SVE locals first before we can deallocate (and
2490 // restore callee saves) from the SVE area.
2492 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2494 false, false, nullptr, EmitCFI && !hasFP(MF),
2495 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2496 NumBytes = 0;
2497 }
2498
2499 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2500 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2501 false, nullptr, EmitCFI && !hasFP(MF),
2502 SVEStackSize +
2503 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2504
2505 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2506 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2507 false, nullptr, EmitCFI && !hasFP(MF),
2508 DeallocateAfter +
2509 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2510 }
2511 if (EmitCFI)
2512 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2513 }
2514
2515 if (!hasFP(MF)) {
2516 bool RedZone = canUseRedZone(MF);
2517 // If this was a redzone leaf function, we don't need to restore the
2518 // stack pointer (but we may need to pop stack args for fastcc).
2519 if (RedZone && AfterCSRPopSize == 0)
2520 return;
2521
2522 // Pop the local variables off the stack. If there are no callee-saved
2523 // registers, it means we are actually positioned at the terminator and can
2524 // combine stack increment for the locals and the stack increment for
2525 // callee-popped arguments into (possibly) a single instruction and be done.
2526 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2527 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2528 if (NoCalleeSaveRestore)
2529 StackRestoreBytes += AfterCSRPopSize;
2530
2532 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2533 StackOffset::getFixed(StackRestoreBytes), TII,
2534 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2535 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2536
2537 // If we were able to combine the local stack pop with the argument pop,
2538 // then we're done.
2539 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2540 return;
2541 }
2542
2543 NumBytes = 0;
2544 }
2545
2546 // Restore the original stack pointer.
2547 // FIXME: Rather than doing the math here, we should instead just use
2548 // non-post-indexed loads for the restores if we aren't actually going to
2549 // be able to save any instructions.
2550 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2552 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2554 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2555 } else if (NumBytes)
2556 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2557 StackOffset::getFixed(NumBytes), TII,
2558 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2559
2560 // When we are about to restore the CSRs, the CFA register is SP again.
2561 if (EmitCFI && hasFP(MF)) {
2562 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2563 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2564 unsigned CFIIndex = MF.addFrameInst(
2565 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2566 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2567 .addCFIIndex(CFIIndex)
2569 }
2570
2571 // This must be placed after the callee-save restore code because that code
2572 // assumes the SP is at the same location as it was after the callee-save save
2573 // code in the prologue.
2574 if (AfterCSRPopSize) {
2575 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2576 "interrupt may have clobbered");
2577
2579 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2581 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2582 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2583 }
2584}
2585
2588 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2589}
2590
2591/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2592/// debug info. It's the same as what we use for resolving the code-gen
2593/// references for now. FIXME: This can go wrong when references are
2594/// SP-relative and simple call frames aren't used.
2597 Register &FrameReg) const {
2599 MF, FI, FrameReg,
2600 /*PreferFP=*/
2601 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
2602 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
2603 /*ForSimm=*/false);
2604}
2605
2608 int FI) const {
2610}
2611
2613 int64_t ObjectOffset) {
2614 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2615 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2616 const Function &F = MF.getFunction();
2617 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
2618 unsigned FixedObject =
2619 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2620 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2621 int64_t FPAdjust =
2622 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2623 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2624}
2625
2627 int64_t ObjectOffset) {
2628 const auto &MFI = MF.getFrameInfo();
2629 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2630}
2631
2632// TODO: This function currently does not work for scalable vectors.
2634 int FI) const {
2635 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2637 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2638 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2639 ? getFPOffset(MF, ObjectOffset).getFixed()
2640 : getStackOffset(MF, ObjectOffset).getFixed();
2641}
2642
2644 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2645 bool ForSimm) const {
2646 const auto &MFI = MF.getFrameInfo();
2647 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2648 bool isFixed = MFI.isFixedObjectIndex(FI);
2649 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2650 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2651 PreferFP, ForSimm);
2652}
2653
2655 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2656 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2657 const auto &MFI = MF.getFrameInfo();
2658 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2660 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2661 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2662
2663 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2664 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2665 bool isCSR =
2666 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2667
2668 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2669
2670 // Use frame pointer to reference fixed objects. Use it for locals if
2671 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2672 // reliable as a base). Make sure useFPForScavengingIndex() does the
2673 // right thing for the emergency spill slot.
2674 bool UseFP = false;
2675 if (AFI->hasStackFrame() && !isSVE) {
2676 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2677 // there are scalable (SVE) objects in between the FP and the fixed-sized
2678 // objects.
2679 PreferFP &= !SVEStackSize;
2680
2681 // Note: Keeping the following as multiple 'if' statements rather than
2682 // merging to a single expression for readability.
2683 //
2684 // Argument access should always use the FP.
2685 if (isFixed) {
2686 UseFP = hasFP(MF);
2687 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2688 // References to the CSR area must use FP if we're re-aligning the stack
2689 // since the dynamically-sized alignment padding is between the SP/BP and
2690 // the CSR area.
2691 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2692 UseFP = true;
2693 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2694 // If the FPOffset is negative and we're producing a signed immediate, we
2695 // have to keep in mind that the available offset range for negative
2696 // offsets is smaller than for positive ones. If an offset is available
2697 // via the FP and the SP, use whichever is closest.
2698 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2699 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2700
2701 if (MFI.hasVarSizedObjects()) {
2702 // If we have variable sized objects, we can use either FP or BP, as the
2703 // SP offset is unknown. We can use the base pointer if we have one and
2704 // FP is not preferred. If not, we're stuck with using FP.
2705 bool CanUseBP = RegInfo->hasBasePointer(MF);
2706 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2707 UseFP = PreferFP;
2708 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2709 UseFP = true;
2710 // else we can use BP and FP, but the offset from FP won't fit.
2711 // That will make us scavenge registers which we can probably avoid by
2712 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2713 } else if (FPOffset >= 0) {
2714 // Use SP or FP, whichever gives us the best chance of the offset
2715 // being in range for direct access. If the FPOffset is positive,
2716 // that'll always be best, as the SP will be even further away.
2717 UseFP = true;
2718 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2719 // Funclets access the locals contained in the parent's stack frame
2720 // via the frame pointer, so we have to use the FP in the parent
2721 // function.
2722 (void) Subtarget;
2723 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2724 MF.getFunction().isVarArg()) &&
2725 "Funclets should only be present on Win64");
2726 UseFP = true;
2727 } else {
2728 // We have the choice between FP and (SP or BP).
2729 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2730 UseFP = true;
2731 }
2732 }
2733 }
2734
2735 assert(
2736 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2737 "In the presence of dynamic stack pointer realignment, "
2738 "non-argument/CSR objects cannot be accessed through the frame pointer");
2739
2740 if (isSVE) {
2741 StackOffset FPOffset =
2743 StackOffset SPOffset =
2744 SVEStackSize +
2745 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2746 ObjectOffset);
2747 // Always use the FP for SVE spills if available and beneficial.
2748 if (hasFP(MF) && (SPOffset.getFixed() ||
2749 FPOffset.getScalable() < SPOffset.getScalable() ||
2750 RegInfo->hasStackRealignment(MF))) {
2751 FrameReg = RegInfo->getFrameRegister(MF);
2752 return FPOffset;
2753 }
2754
2755 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2756 : (unsigned)AArch64::SP;
2757 return SPOffset;
2758 }
2759
2760 StackOffset ScalableOffset = {};
2761 if (UseFP && !(isFixed || isCSR))
2762 ScalableOffset = -SVEStackSize;
2763 if (!UseFP && (isFixed || isCSR))
2764 ScalableOffset = SVEStackSize;
2765
2766 if (UseFP) {
2767 FrameReg = RegInfo->getFrameRegister(MF);
2768 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2769 }
2770
2771 // Use the base pointer if we have one.
2772 if (RegInfo->hasBasePointer(MF))
2773 FrameReg = RegInfo->getBaseRegister();
2774 else {
2775 assert(!MFI.hasVarSizedObjects() &&
2776 "Can't use SP when we have var sized objects.");
2777 FrameReg = AArch64::SP;
2778 // If we're using the red zone for this function, the SP won't actually
2779 // be adjusted, so the offsets will be negative. They're also all
2780 // within range of the signed 9-bit immediate instructions.
2781 if (canUseRedZone(MF))
2782 Offset -= AFI->getLocalStackSize();
2783 }
2784
2785 return StackOffset::getFixed(Offset) + ScalableOffset;
2786}
2787
2788static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2789 // Do not set a kill flag on values that are also marked as live-in. This
2790 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2791 // callee saved registers.
2792 // Omitting the kill flags is conservatively correct even if the live-in
2793 // is not used after all.
2794 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2795 return getKillRegState(!IsLiveIn);
2796}
2797
2799 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2801 return Subtarget.isTargetMachO() &&
2802 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2803 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2805}
2806
2807static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2808 bool NeedsWinCFI, bool IsFirst,
2809 const TargetRegisterInfo *TRI) {
2810 // If we are generating register pairs for a Windows function that requires
2811 // EH support, then pair consecutive registers only. There are no unwind
2812 // opcodes for saves/restores of non-consectuve register pairs.
2813 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2814 // save_lrpair.
2815 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2816
2817 if (Reg2 == AArch64::FP)
2818 return true;
2819 if (!NeedsWinCFI)
2820 return false;
2821 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2822 return false;
2823 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2824 // opcode. If this is the first register pair, it would end up with a
2825 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2826 // if LR is paired with something else than the first register.
2827 // The save_lrpair opcode requires the first register to be an odd one.
2828 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2829 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2830 return false;
2831 return true;
2832}
2833
2834/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2835/// WindowsCFI requires that only consecutive registers can be paired.
2836/// LR and FP need to be allocated together when the frame needs to save
2837/// the frame-record. This means any other register pairing with LR is invalid.
2838static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2839 bool UsesWinAAPCS, bool NeedsWinCFI,
2840 bool NeedsFrameRecord, bool IsFirst,
2841 const TargetRegisterInfo *TRI) {
2842 if (UsesWinAAPCS)
2843 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2844 TRI);
2845
2846 // If we need to store the frame record, don't pair any register
2847 // with LR other than FP.
2848 if (NeedsFrameRecord)
2849 return Reg2 == AArch64::LR;
2850
2851 return false;
2852}
2853
2854namespace {
2855
2856struct RegPairInfo {
2857 unsigned Reg1 = AArch64::NoRegister;
2858 unsigned Reg2 = AArch64::NoRegister;
2859 int FrameIdx;
2860 int Offset;
2861 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
2862
2863 RegPairInfo() = default;
2864
2865 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2866
2867 unsigned getScale() const {
2868 switch (Type) {
2869 case PPR:
2870 return 2;
2871 case GPR:
2872 case FPR64:
2873 case VG:
2874 return 8;
2875 case ZPR:
2876 case FPR128:
2877 return 16;
2878 }
2879 llvm_unreachable("Unsupported type");
2880 }
2881
2882 bool isScalable() const { return Type == PPR || Type == ZPR; }
2883};
2884
2885} // end anonymous namespace
2886
2887unsigned findFreePredicateReg(BitVector &SavedRegs) {
2888 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
2889 if (SavedRegs.test(PReg)) {
2890 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
2891 return PNReg;
2892 }
2893 }
2894 return AArch64::NoRegister;
2895}
2896
2900 bool NeedsFrameRecord) {
2901
2902 if (CSI.empty())
2903 return;
2904
2905 bool IsWindows = isTargetWindows(MF);
2906 bool NeedsWinCFI = needsWinCFI(MF);
2908 MachineFrameInfo &MFI = MF.getFrameInfo();
2910 unsigned Count = CSI.size();
2911 (void)CC;
2912 // MachO's compact unwind format relies on all registers being stored in
2913 // pairs.
2916 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2917 "Odd number of callee-saved regs to spill!");
2918 int ByteOffset = AFI->getCalleeSavedStackSize();
2919 int StackFillDir = -1;
2920 int RegInc = 1;
2921 unsigned FirstReg = 0;
2922 if (NeedsWinCFI) {
2923 // For WinCFI, fill the stack from the bottom up.
2924 ByteOffset = 0;
2925 StackFillDir = 1;
2926 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2927 // backwards, to pair up registers starting from lower numbered registers.
2928 RegInc = -1;
2929 FirstReg = Count - 1;
2930 }
2931 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
2932 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2933 Register LastReg = 0;
2934
2935 // When iterating backwards, the loop condition relies on unsigned wraparound.
2936 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2937 RegPairInfo RPI;
2938 RPI.Reg1 = CSI[i].getReg();
2939
2940 if (AArch64::GPR64RegClass.contains(RPI.Reg1))
2941 RPI.Type = RegPairInfo::GPR;
2942 else if (AArch64::FPR64RegClass.contains(RPI.Reg1))
2943 RPI.Type = RegPairInfo::FPR64;
2944 else if (AArch64::FPR128RegClass.contains(RPI.Reg1))
2945 RPI.Type = RegPairInfo::FPR128;
2946 else if (AArch64::ZPRRegClass.contains(RPI.Reg1))
2947 RPI.Type = RegPairInfo::ZPR;
2948 else if (AArch64::PPRRegClass.contains(RPI.Reg1))
2949 RPI.Type = RegPairInfo::PPR;
2950 else if (RPI.Reg1 == AArch64::VG)
2951 RPI.Type = RegPairInfo::VG;
2952 else
2953 llvm_unreachable("Unsupported register class.");
2954
2955 // Add the stack hazard size as we transition from GPR->FPR CSRs.
2956 if (AFI->hasStackHazardSlotIndex() &&
2957 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2959 ByteOffset += StackFillDir * StackHazardSize;
2960 LastReg = RPI.Reg1;
2961
2962 // Add the next reg to the pair if it is in the same register class.
2963 if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
2964 Register NextReg = CSI[i + RegInc].getReg();
2965 bool IsFirst = i == FirstReg;
2966 switch (RPI.Type) {
2967 case RegPairInfo::GPR:
2968 if (AArch64::GPR64RegClass.contains(NextReg) &&
2969 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
2970 NeedsWinCFI, NeedsFrameRecord, IsFirst,
2971 TRI))
2972 RPI.Reg2 = NextReg;
2973 break;
2974 case RegPairInfo::FPR64:
2975 if (AArch64::FPR64RegClass.contains(NextReg) &&
2976 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
2977 IsFirst, TRI))
2978 RPI.Reg2 = NextReg;
2979 break;
2980 case RegPairInfo::FPR128:
2981 if (AArch64::FPR128RegClass.contains(NextReg))
2982 RPI.Reg2 = NextReg;
2983 break;
2984 case RegPairInfo::PPR:
2985 break;
2986 case RegPairInfo::ZPR:
2987 if (AFI->getPredicateRegForFillSpill() != 0)
2988 if (((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1))
2989 RPI.Reg2 = NextReg;
2990 break;
2991 case RegPairInfo::VG:
2992 break;
2993 }
2994 }
2995
2996 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
2997 // list to come in sorted by frame index so that we can issue the store
2998 // pair instructions directly. Assert if we see anything otherwise.
2999 //
3000 // The order of the registers in the list is controlled by
3001 // getCalleeSavedRegs(), so they will always be in-order, as well.
3002 assert((!RPI.isPaired() ||
3003 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
3004 "Out of order callee saved regs!");
3005
3006 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
3007 RPI.Reg1 == AArch64::LR) &&
3008 "FrameRecord must be allocated together with LR");
3009
3010 // Windows AAPCS has FP and LR reversed.
3011 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
3012 RPI.Reg2 == AArch64::LR) &&
3013 "FrameRecord must be allocated together with LR");
3014
3015 // MachO's compact unwind format relies on all registers being stored in
3016 // adjacent register pairs.
3020 (RPI.isPaired() &&
3021 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
3022 RPI.Reg1 + 1 == RPI.Reg2))) &&
3023 "Callee-save registers not saved as adjacent register pair!");
3024
3025 RPI.FrameIdx = CSI[i].getFrameIdx();
3026 if (NeedsWinCFI &&
3027 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
3028 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
3029 int Scale = RPI.getScale();
3030
3031 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3032 assert(OffsetPre % Scale == 0);
3033
3034 if (RPI.isScalable())
3035 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3036 else
3037 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3038
3039 // Swift's async context is directly before FP, so allocate an extra
3040 // 8 bytes for it.
3041 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3042 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3043 (IsWindows && RPI.Reg2 == AArch64::LR)))
3044 ByteOffset += StackFillDir * 8;
3045
3046 // Round up size of non-pair to pair size if we need to pad the
3047 // callee-save area to ensure 16-byte alignment.
3048 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
3049 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
3050 ByteOffset % 16 != 0) {
3051 ByteOffset += 8 * StackFillDir;
3052 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
3053 // A stack frame with a gap looks like this, bottom up:
3054 // d9, d8. x21, gap, x20, x19.
3055 // Set extra alignment on the x21 object to create the gap above it.
3056 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
3057 NeedGapToAlignStack = false;
3058 }
3059
3060 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3061 assert(OffsetPost % Scale == 0);
3062 // If filling top down (default), we want the offset after incrementing it.
3063 // If filling bottom up (WinCFI) we need the original offset.
3064 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
3065
3066 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
3067 // Swift context can directly precede FP.
3068 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3069 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3070 (IsWindows && RPI.Reg2 == AArch64::LR)))
3071 Offset += 8;
3072 RPI.Offset = Offset / Scale;
3073
3074 assert((!RPI.isPaired() ||
3075 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
3076 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
3077 "Offset out of bounds for LDP/STP immediate");
3078
3079 // Save the offset to frame record so that the FP register can point to the
3080 // innermost frame record (spilled FP and LR registers).
3081 if (NeedsFrameRecord &&
3082 ((!IsWindows && RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
3083 (IsWindows && RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR)))
3085
3086 RegPairs.push_back(RPI);
3087 if (RPI.isPaired())
3088 i += RegInc;
3089 }
3090 if (NeedsWinCFI) {
3091 // If we need an alignment gap in the stack, align the topmost stack
3092 // object. A stack frame with a gap looks like this, bottom up:
3093 // x19, d8. d9, gap.
3094 // Set extra alignment on the topmost stack object (the first element in
3095 // CSI, which goes top down), to create the gap above it.
3096 if (AFI->hasCalleeSaveStackFreeSpace())
3097 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
3098 // We iterated bottom up over the registers; flip RegPairs back to top
3099 // down order.
3100 std::reverse(RegPairs.begin(), RegPairs.end());
3101 }
3102}
3103
3107 MachineFunction &MF = *MBB.getParent();
3110 bool NeedsWinCFI = needsWinCFI(MF);
3111 DebugLoc DL;
3113
3114 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3115
3117 // Refresh the reserved regs in case there are any potential changes since the
3118 // last freeze.
3119 MRI.freezeReservedRegs();
3120
3121 if (homogeneousPrologEpilog(MF)) {
3122 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
3124
3125 for (auto &RPI : RegPairs) {
3126 MIB.addReg(RPI.Reg1);
3127 MIB.addReg(RPI.Reg2);
3128
3129 // Update register live in.
3130 if (!MRI.isReserved(RPI.Reg1))
3131 MBB.addLiveIn(RPI.Reg1);
3132 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
3133 MBB.addLiveIn(RPI.Reg2);
3134 }
3135 return true;
3136 }
3137 bool PTrueCreated = false;
3138 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
3139 unsigned Reg1 = RPI.Reg1;
3140 unsigned Reg2 = RPI.Reg2;
3141 unsigned StrOpc;
3142
3143 // Issue sequence of spills for cs regs. The first spill may be converted
3144 // to a pre-decrement store later by emitPrologue if the callee-save stack
3145 // area allocation can't be combined with the local stack area allocation.
3146 // For example:
3147 // stp x22, x21, [sp, #0] // addImm(+0)
3148 // stp x20, x19, [sp, #16] // addImm(+2)
3149 // stp fp, lr, [sp, #32] // addImm(+4)
3150 // Rationale: This sequence saves uop updates compared to a sequence of
3151 // pre-increment spills like stp xi,xj,[sp,#-16]!
3152 // Note: Similar rationale and sequence for restores in epilog.
3153 unsigned Size;
3154 Align Alignment;
3155 switch (RPI.Type) {
3156 case RegPairInfo::GPR:
3157 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
3158 Size = 8;
3159 Alignment = Align(8);
3160 break;
3161 case RegPairInfo::FPR64:
3162 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
3163 Size = 8;
3164 Alignment = Align(8);
3165 break;
3166 case RegPairInfo::FPR128:
3167 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
3168 Size = 16;
3169 Alignment = Align(16);
3170 break;
3171 case RegPairInfo::ZPR:
3172 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
3173 Size = 16;
3174 Alignment = Align(16);
3175 break;
3176 case RegPairInfo::PPR:
3177 StrOpc = AArch64::STR_PXI;
3178 Size = 2;
3179 Alignment = Align(2);
3180 break;
3181 case RegPairInfo::VG:
3182 StrOpc = AArch64::STRXui;
3183 Size = 8;
3184 Alignment = Align(8);
3185 break;
3186 }
3187
3188 unsigned X0Scratch = AArch64::NoRegister;
3189 if (Reg1 == AArch64::VG) {
3190 // Find an available register to store value of VG to.
3192 assert(Reg1 != AArch64::NoRegister);
3193 SMEAttrs Attrs(MF.getFunction());
3194
3195 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface() &&
3196 AFI->getStreamingVGIdx() == std::numeric_limits<int>::max()) {
3197 // For locally-streaming functions, we need to store both the streaming
3198 // & non-streaming VG. Spill the streaming value first.
3199 BuildMI(MBB, MI, DL, TII.get(AArch64::RDSVLI_XI), Reg1)
3200 .addImm(1)
3202 BuildMI(MBB, MI, DL, TII.get(AArch64::UBFMXri), Reg1)
3203 .addReg(Reg1)
3204 .addImm(3)
3205 .addImm(63)
3207
3208 AFI->setStreamingVGIdx(RPI.FrameIdx);
3209 } else if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
3210 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
3211 .addImm(31)
3212 .addImm(1)
3214 AFI->setVGIdx(RPI.FrameIdx);
3215 } else {
3217 if (llvm::any_of(
3218 MBB.liveins(),
3219 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
3220 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
3221 AArch64::X0, LiveIn.PhysReg);
3222 }))
3223 X0Scratch = Reg1;
3224
3225 if (X0Scratch != AArch64::NoRegister)
3226 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), Reg1)
3227 .addReg(AArch64::XZR)
3228 .addReg(AArch64::X0, RegState::Undef)
3229 .addReg(AArch64::X0, RegState::Implicit)
3231
3232 const uint32_t *RegMask = TRI->getCallPreservedMask(
3233 MF,
3235 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
3236 .addExternalSymbol("__arm_get_current_vg")
3237 .addRegMask(RegMask)
3238 .addReg(AArch64::X0, RegState::ImplicitDefine)
3240 Reg1 = AArch64::X0;
3241 AFI->setVGIdx(RPI.FrameIdx);
3242 }
3243 }
3244
3245 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
3246 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3247 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3248 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3249 dbgs() << ")\n");
3250
3251 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
3252 "Windows unwdinding requires a consecutive (FP,LR) pair");
3253 // Windows unwind codes require consecutive registers if registers are
3254 // paired. Make the switch here, so that the code below will save (x,x+1)
3255 // and not (x+1,x).
3256 unsigned FrameIdxReg1 = RPI.FrameIdx;
3257 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3258 if (NeedsWinCFI && RPI.isPaired()) {
3259 std::swap(Reg1, Reg2);
3260 std::swap(FrameIdxReg1, FrameIdxReg2);
3261 }
3262
3263 if (RPI.isPaired() && RPI.isScalable()) {
3264 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3267 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3268 assert(((Subtarget.hasSVE2p1() || Subtarget.hasSME2()) && PnReg != 0) &&
3269 "Expects SVE2.1 or SME2 target and a predicate register");
3270#ifdef EXPENSIVE_CHECKS
3271 auto IsPPR = [](const RegPairInfo &c) {
3272 return c.Reg1 == RegPairInfo::PPR;
3273 };
3274 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3275 auto IsZPR = [](const RegPairInfo &c) {
3276 return c.Type == RegPairInfo::ZPR;
3277 };
3278 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3279 assert(!(PPRBegin < ZPRBegin) &&
3280 "Expected callee save predicate to be handled first");
3281#endif
3282 if (!PTrueCreated) {
3283 PTrueCreated = true;
3284 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3286 }
3287 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3288 if (!MRI.isReserved(Reg1))
3289 MBB.addLiveIn(Reg1);
3290 if (!MRI.isReserved(Reg2))
3291 MBB.addLiveIn(Reg2);
3292 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
3294 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3295 MachineMemOperand::MOStore, Size, Alignment));
3296 MIB.addReg(PnReg);
3297 MIB.addReg(AArch64::SP)
3298 .addImm(RPI.Offset) // [sp, #offset*scale],
3299 // where factor*scale is implicit
3302 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3303 MachineMemOperand::MOStore, Size, Alignment));
3304 if (NeedsWinCFI)
3306 } else { // The code when the pair of ZReg is not present
3307 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3308 if (!MRI.isReserved(Reg1))
3309 MBB.addLiveIn(Reg1);
3310 if (RPI.isPaired()) {
3311 if (!MRI.isReserved(Reg2))
3312 MBB.addLiveIn(Reg2);
3313 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
3315 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3316 MachineMemOperand::MOStore, Size, Alignment));
3317 }
3318 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
3319 .addReg(AArch64::SP)
3320 .addImm(RPI.Offset) // [sp, #offset*scale],
3321 // where factor*scale is implicit
3324 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3325 MachineMemOperand::MOStore, Size, Alignment));
3326 if (NeedsWinCFI)
3328 }
3329 // Update the StackIDs of the SVE stack slots.
3330 MachineFrameInfo &MFI = MF.getFrameInfo();
3331 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
3332 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
3333 if (RPI.isPaired())
3334 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
3335 }
3336
3337 if (X0Scratch != AArch64::NoRegister)
3338 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
3339 .addReg(AArch64::XZR)
3340 .addReg(X0Scratch, RegState::Undef)
3341 .addReg(X0Scratch, RegState::Implicit)
3343 }
3344 return true;
3345}
3346
3350 MachineFunction &MF = *MBB.getParent();
3352 DebugLoc DL;
3354 bool NeedsWinCFI = needsWinCFI(MF);
3355
3356 if (MBBI != MBB.end())
3357 DL = MBBI->getDebugLoc();
3358
3359 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3360 if (homogeneousPrologEpilog(MF, &MBB)) {
3361 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
3363 for (auto &RPI : RegPairs) {
3364 MIB.addReg(RPI.Reg1, RegState::Define);
3365 MIB.addReg(RPI.Reg2, RegState::Define);
3366 }
3367 return true;
3368 }
3369
3370 // For performance reasons restore SVE register in increasing order
3371 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
3372 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3373 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
3374 std::reverse(PPRBegin, PPREnd);
3375 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
3376 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3377 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
3378 std::reverse(ZPRBegin, ZPREnd);
3379
3380 bool PTrueCreated = false;
3381 for (const RegPairInfo &RPI : RegPairs) {
3382 unsigned Reg1 = RPI.Reg1;
3383 unsigned Reg2 = RPI.Reg2;
3384
3385 // Issue sequence of restores for cs regs. The last restore may be converted
3386 // to a post-increment load later by emitEpilogue if the callee-save stack
3387 // area allocation can't be combined with the local stack area allocation.
3388 // For example:
3389 // ldp fp, lr, [sp, #32] // addImm(+4)
3390 // ldp x20, x19, [sp, #16] // addImm(+2)
3391 // ldp x22, x21, [sp, #0] // addImm(+0)
3392 // Note: see comment in spillCalleeSavedRegisters()
3393 unsigned LdrOpc;
3394 unsigned Size;
3395 Align Alignment;
3396 switch (RPI.Type) {
3397 case RegPairInfo::GPR:
3398 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
3399 Size = 8;
3400 Alignment = Align(8);
3401 break;
3402 case RegPairInfo::FPR64:
3403 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
3404 Size = 8;
3405 Alignment = Align(8);
3406 break;
3407 case RegPairInfo::FPR128:
3408 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
3409 Size = 16;
3410 Alignment = Align(16);
3411 break;
3412 case RegPairInfo::ZPR:
3413 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
3414 Size = 16;
3415 Alignment = Align(16);
3416 break;
3417 case RegPairInfo::PPR:
3418 LdrOpc = AArch64::LDR_PXI;
3419 Size = 2;
3420 Alignment = Align(2);
3421 break;
3422 case RegPairInfo::VG:
3423 continue;
3424 }
3425 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
3426 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3427 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3428 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3429 dbgs() << ")\n");
3430
3431 // Windows unwind codes require consecutive registers if registers are
3432 // paired. Make the switch here, so that the code below will save (x,x+1)
3433 // and not (x+1,x).
3434 unsigned FrameIdxReg1 = RPI.FrameIdx;
3435 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3436 if (NeedsWinCFI && RPI.isPaired()) {
3437 std::swap(Reg1, Reg2);
3438 std::swap(FrameIdxReg1, FrameIdxReg2);
3439 }
3440
3442 if (RPI.isPaired() && RPI.isScalable()) {
3443 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3445 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3446 assert(((Subtarget.hasSVE2p1() || Subtarget.hasSME2()) && PnReg != 0) &&
3447 "Expects SVE2.1 or SME2 target and a predicate register");
3448#ifdef EXPENSIVE_CHECKS
3449 assert(!(PPRBegin < ZPRBegin) &&
3450 "Expected callee save predicate to be handled first");
3451#endif
3452 if (!PTrueCreated) {
3453 PTrueCreated = true;
3454 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3456 }
3457 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3458 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
3459 getDefRegState(true));
3461 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3462 MachineMemOperand::MOLoad, Size, Alignment));
3463 MIB.addReg(PnReg);
3464 MIB.addReg(AArch64::SP)
3465 .addImm(RPI.Offset) // [sp, #offset*scale]
3466 // where factor*scale is implicit
3469 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3470 MachineMemOperand::MOLoad, Size, Alignment));
3471 if (NeedsWinCFI)
3473 } else {
3474 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3475 if (RPI.isPaired()) {
3476 MIB.addReg(Reg2, getDefRegState(true));
3478 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3479 MachineMemOperand::MOLoad, Size, Alignment));
3480 }
3481 MIB.addReg(Reg1, getDefRegState(true));
3482 MIB.addReg(AArch64::SP)
3483 .addImm(RPI.Offset) // [sp, #offset*scale]
3484 // where factor*scale is implicit
3487 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3488 MachineMemOperand::MOLoad, Size, Alignment));
3489 if (NeedsWinCFI)
3491 }
3492 }
3493 return true;
3494}
3495
3496// Return the FrameID for a Load/Store instruction by looking at the MMO.
3497static std::optional<int> getLdStFrameID(const MachineInstr &MI,
3498 const MachineFrameInfo &MFI) {
3499 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3500 return std::nullopt;
3501
3502 MachineMemOperand *MMO = *MI.memoperands_begin();
3503 auto *PSV =
3504 dyn_cast_or_null<FixedStackPseudoSourceValue>(MMO->getPseudoValue());
3505 if (PSV)
3506 return std::optional<int>(PSV->getFrameIndex());
3507
3508 if (MMO->getValue()) {
3509 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
3510 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
3511 FI++)
3512 if (MFI.getObjectAllocation(FI) == Al)
3513 return FI;
3514 }
3515 }
3516
3517 return std::nullopt;
3518}
3519
3520// Check if a Hazard slot is needed for the current function, and if so create
3521// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
3522// which can be used to determine if any hazard padding is needed.
3523void AArch64FrameLowering::determineStackHazardSlot(
3524 MachineFunction &MF, BitVector &SavedRegs) const {
3525 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
3527 return;
3528
3529 // Stack hazards are only needed in streaming functions.
3531 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
3532 return;
3533
3534 MachineFrameInfo &MFI = MF.getFrameInfo();
3535
3536 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
3537 // stack objects.
3538 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
3539 return AArch64::FPR64RegClass.contains(Reg) ||
3540 AArch64::FPR128RegClass.contains(Reg) ||
3541 AArch64::ZPRRegClass.contains(Reg) ||
3542 AArch64::PPRRegClass.contains(Reg);
3543 });
3544 bool HasFPRStackObjects = false;
3545 if (!HasFPRCSRs) {
3546 std::vector<unsigned> FrameObjects(MFI.getObjectIndexEnd());
3547 for (auto &MBB : MF) {
3548 for (auto &MI : MBB) {
3549 std::optional<int> FI = getLdStFrameID(MI, MFI);
3550 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3551 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3553 FrameObjects[*FI] |= 2;
3554 else
3555 FrameObjects[*FI] |= 1;
3556 }
3557 }
3558 }
3559 HasFPRStackObjects =
3560 any_of(FrameObjects, [](unsigned B) { return (B & 3) == 2; });
3561 }
3562
3563 if (HasFPRCSRs || HasFPRStackObjects) {
3564 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
3565 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
3566 << StackHazardSize << "\n");
3567 MF.getInfo<AArch64FunctionInfo>()->setStackHazardSlotIndex(ID);
3568 }
3569}
3570
3572 BitVector &SavedRegs,
3573 RegScavenger *RS) const {
3574 // All calls are tail calls in GHC calling conv, and functions have no
3575 // prologue/epilogue.
3577 return;
3578
3580 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3582 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3584 unsigned UnspilledCSGPR = AArch64::NoRegister;
3585 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3586
3587 MachineFrameInfo &MFI = MF.getFrameInfo();
3588 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3589
3590 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3591 ? RegInfo->getBaseRegister()
3592 : (unsigned)AArch64::NoRegister;
3593
3594 unsigned ExtraCSSpill = 0;
3595 bool HasUnpairedGPR64 = false;
3596 bool HasPairZReg = false;
3597 // Figure out which callee-saved registers to save/restore.
3598 for (unsigned i = 0; CSRegs[i]; ++i) {
3599 const unsigned Reg = CSRegs[i];
3600
3601 // Add the base pointer register to SavedRegs if it is callee-save.
3602 if (Reg == BasePointerReg)
3603 SavedRegs.set(Reg);
3604
3605 bool RegUsed = SavedRegs.test(Reg);
3606 unsigned PairedReg = AArch64::NoRegister;
3607 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3608 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3609 AArch64::FPR128RegClass.contains(Reg)) {
3610 // Compensate for odd numbers of GP CSRs.
3611 // For now, all the known cases of odd number of CSRs are of GPRs.
3612 if (HasUnpairedGPR64)
3613 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3614 else
3615 PairedReg = CSRegs[i ^ 1];
3616 }
3617
3618 // If the function requires all the GP registers to save (SavedRegs),
3619 // and there are an odd number of GP CSRs at the same time (CSRegs),
3620 // PairedReg could be in a different register class from Reg, which would
3621 // lead to a FPR (usually D8) accidentally being marked saved.
3622 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3623 PairedReg = AArch64::NoRegister;
3624 HasUnpairedGPR64 = true;
3625 }
3626 assert(PairedReg == AArch64::NoRegister ||
3627 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3628 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3629 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3630
3631 if (!RegUsed) {
3632 if (AArch64::GPR64RegClass.contains(Reg) &&
3633 !RegInfo->isReservedReg(MF, Reg)) {
3634 UnspilledCSGPR = Reg;
3635 UnspilledCSGPRPaired = PairedReg;
3636 }
3637 continue;
3638 }
3639
3640 // MachO's compact unwind format relies on all registers being stored in
3641 // pairs.
3642 // FIXME: the usual format is actually better if unwinding isn't needed.
3643 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3644 !SavedRegs.test(PairedReg)) {
3645 SavedRegs.set(PairedReg);
3646 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3647 !RegInfo->isReservedReg(MF, PairedReg))
3648 ExtraCSSpill = PairedReg;
3649 }
3650 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
3651 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
3652 SavedRegs.test(CSRegs[i ^ 1]));
3653 }
3654
3655 if (HasPairZReg && (Subtarget.hasSVE2p1() || Subtarget.hasSME2())) {
3657 // Find a suitable predicate register for the multi-vector spill/fill
3658 // instructions.
3659 unsigned PnReg = findFreePredicateReg(SavedRegs);
3660 if (PnReg != AArch64::NoRegister)
3661 AFI->setPredicateRegForFillSpill(PnReg);
3662 // If no free callee-save has been found assign one.
3663 if (!AFI->getPredicateRegForFillSpill() &&
3664 MF.getFunction().getCallingConv() ==
3666 SavedRegs.set(AArch64::P8);
3667 AFI->setPredicateRegForFillSpill(AArch64::PN8);
3668 }
3669
3670 assert(!RegInfo->isReservedReg(MF, AFI->getPredicateRegForFillSpill()) &&
3671 "Predicate cannot be a reserved register");
3672 }
3673
3675 !Subtarget.isTargetWindows()) {
3676 // For Windows calling convention on a non-windows OS, where X18 is treated
3677 // as reserved, back up X18 when entering non-windows code (marked with the
3678 // Windows calling convention) and restore when returning regardless of
3679 // whether the individual function uses it - it might call other functions
3680 // that clobber it.
3681 SavedRegs.set(AArch64::X18);
3682 }
3683
3684 // Calculates the callee saved stack size.
3685 unsigned CSStackSize = 0;
3686 unsigned SVECSStackSize = 0;
3688 const MachineRegisterInfo &MRI = MF.getRegInfo();
3689 for (unsigned Reg : SavedRegs.set_bits()) {
3690 auto RegSize = TRI->getRegSizeInBits(Reg, MRI) / 8;
3691 if (AArch64::PPRRegClass.contains(Reg) ||
3692 AArch64::ZPRRegClass.contains(Reg))
3693 SVECSStackSize += RegSize;
3694 else
3695 CSStackSize += RegSize;
3696 }
3697
3698 // Increase the callee-saved stack size if the function has streaming mode
3699 // changes, as we will need to spill the value of the VG register.
3700 // For locally streaming functions, we spill both the streaming and
3701 // non-streaming VG value.
3702 const Function &F = MF.getFunction();
3703 SMEAttrs Attrs(F);
3704 if (AFI->hasStreamingModeChanges()) {
3705 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3706 CSStackSize += 16;
3707 else
3708 CSStackSize += 8;
3709 }
3710
3711 // Determine if a Hazard slot should be used, and increase the CSStackSize by
3712 // StackHazardSize if so.
3713 determineStackHazardSlot(MF, SavedRegs);
3714 if (AFI->hasStackHazardSlotIndex())
3715 CSStackSize += StackHazardSize;
3716
3717 // Save number of saved regs, so we can easily update CSStackSize later.
3718 unsigned NumSavedRegs = SavedRegs.count();
3719
3720 // The frame record needs to be created by saving the appropriate registers
3721 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3722 if (hasFP(MF) ||
3723 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3724 SavedRegs.set(AArch64::FP);
3725 SavedRegs.set(AArch64::LR);
3726 }
3727
3728 LLVM_DEBUG({
3729 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3730 for (unsigned Reg : SavedRegs.set_bits())
3731 dbgs() << ' ' << printReg(Reg, RegInfo);
3732 dbgs() << "\n";
3733 });
3734
3735 // If any callee-saved registers are used, the frame cannot be eliminated.
3736 int64_t SVEStackSize =
3737 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3738 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3739
3740 // The CSR spill slots have not been allocated yet, so estimateStackSize
3741 // won't include them.
3742 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3743
3744 // We may address some of the stack above the canonical frame address, either
3745 // for our own arguments or during a call. Include that in calculating whether
3746 // we have complicated addressing concerns.
3747 int64_t CalleeStackUsed = 0;
3748 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3749 int64_t FixedOff = MFI.getObjectOffset(I);
3750 if (FixedOff > CalleeStackUsed)
3751 CalleeStackUsed = FixedOff;
3752 }
3753
3754 // Conservatively always assume BigStack when there are SVE spills.
3755 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3756 CalleeStackUsed) > EstimatedStackSizeLimit;
3757 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3758 AFI->setHasStackFrame(true);
3759
3760 // Estimate if we might need to scavenge a register at some point in order
3761 // to materialize a stack offset. If so, either spill one additional
3762 // callee-saved register or reserve a special spill slot to facilitate
3763 // register scavenging. If we already spilled an extra callee-saved register
3764 // above to keep the number of spills even, we don't need to do anything else
3765 // here.
3766 if (BigStack) {
3767 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3768 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3769 << " to get a scratch register.\n");
3770 SavedRegs.set(UnspilledCSGPR);
3771 ExtraCSSpill = UnspilledCSGPR;
3772
3773 // MachO's compact unwind format relies on all registers being stored in
3774 // pairs, so if we need to spill one extra for BigStack, then we need to
3775 // store the pair.
3776 if (producePairRegisters(MF)) {
3777 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
3778 // Failed to make a pair for compact unwind format, revert spilling.
3779 if (produceCompactUnwindFrame(MF)) {
3780 SavedRegs.reset(UnspilledCSGPR);
3781 ExtraCSSpill = AArch64::NoRegister;
3782 }
3783 } else
3784 SavedRegs.set(UnspilledCSGPRPaired);
3785 }
3786 }
3787
3788 // If we didn't find an extra callee-saved register to spill, create
3789 // an emergency spill slot.
3790 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3792 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3793 unsigned Size = TRI->getSpillSize(RC);
3794 Align Alignment = TRI->getSpillAlign(RC);
3795 int FI = MFI.CreateStackObject(Size, Alignment, false);
3797 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3798 << " as the emergency spill slot.\n");
3799 }
3800 }
3801
3802 // Adding the size of additional 64bit GPR saves.
3803 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3804
3805 // A Swift asynchronous context extends the frame record with a pointer
3806 // directly before FP.
3807 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3808 CSStackSize += 8;
3809
3810 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3811 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3812 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
3813
3815 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3816 "Should not invalidate callee saved info");
3817
3818 // Round up to register pair alignment to avoid additional SP adjustment
3819 // instructions.
3820 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3821 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3822 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3823}
3824
3826 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3827 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3828 unsigned &MaxCSFrameIndex) const {
3829 bool NeedsWinCFI = needsWinCFI(MF);
3830 // To match the canonical windows frame layout, reverse the list of
3831 // callee saved registers to get them laid out by PrologEpilogInserter
3832 // in the right order. (PrologEpilogInserter allocates stack objects top
3833 // down. Windows canonical prologs store higher numbered registers at
3834 // the top, thus have the CSI array start from the highest registers.)
3835 if (NeedsWinCFI)
3836 std::reverse(CSI.begin(), CSI.end());
3837
3838 if (CSI.empty())
3839 return true; // Early exit if no callee saved registers are modified!
3840
3841 // Now that we know which registers need to be saved and restored, allocate
3842 // stack slots for them.
3843 MachineFrameInfo &MFI = MF.getFrameInfo();
3844 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3845
3846 bool UsesWinAAPCS = isTargetWindows(MF);
3847 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3848 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3849 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3850 if ((unsigned)FrameIdx < MinCSFrameIndex)
3851 MinCSFrameIndex = FrameIdx;
3852 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3853 MaxCSFrameIndex = FrameIdx;
3854 }
3855
3856 // Insert VG into the list of CSRs, immediately before LR if saved.
3857 if (AFI->hasStreamingModeChanges()) {
3858 std::vector<CalleeSavedInfo> VGSaves;
3859 SMEAttrs Attrs(MF.getFunction());
3860
3861 auto VGInfo = CalleeSavedInfo(AArch64::VG);
3862 VGInfo.setRestored(false);
3863 VGSaves.push_back(VGInfo);
3864
3865 // Add VG again if the function is locally-streaming, as we will spill two
3866 // values.
3867 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3868 VGSaves.push_back(VGInfo);
3869
3870 bool InsertBeforeLR = false;
3871
3872 for (unsigned I = 0; I < CSI.size(); I++)
3873 if (CSI[I].getReg() == AArch64::LR) {
3874 InsertBeforeLR = true;
3875 CSI.insert(CSI.begin() + I, VGSaves.begin(), VGSaves.end());
3876 break;
3877 }
3878
3879 if (!InsertBeforeLR)
3880 CSI.insert(CSI.end(), VGSaves.begin(), VGSaves.end());
3881 }
3882
3883 Register LastReg = 0;
3884 int HazardSlotIndex = std::numeric_limits<int>::max();
3885 for (auto &CS : CSI) {
3886 Register Reg = CS.getReg();
3887 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3888
3889 // Create a hazard slot as we switch between GPR and FPR CSRs.
3890 if (AFI->hasStackHazardSlotIndex() &&
3891 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3893 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
3894 "Unexpected register order for hazard slot");
3895 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3896 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3897 << "\n");
3898 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3899 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3900 MinCSFrameIndex = HazardSlotIndex;
3901 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
3902 MaxCSFrameIndex = HazardSlotIndex;
3903 }
3904
3905 unsigned Size = RegInfo->getSpillSize(*RC);
3906 Align Alignment(RegInfo->getSpillAlign(*RC));
3907 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3908 CS.setFrameIdx(FrameIdx);
3909
3910 if ((unsigned)FrameIdx < MinCSFrameIndex)
3911 MinCSFrameIndex = FrameIdx;
3912 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3913 MaxCSFrameIndex = FrameIdx;
3914
3915 // Grab 8 bytes below FP for the extended asynchronous frame info.
3916 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3917 Reg == AArch64::FP) {
3918 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3919 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3920 if ((unsigned)FrameIdx < MinCSFrameIndex)
3921 MinCSFrameIndex = FrameIdx;
3922 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3923 MaxCSFrameIndex = FrameIdx;
3924 }
3925 LastReg = Reg;
3926 }
3927
3928 // Add hazard slot in the case where no FPR CSRs are present.
3929 if (AFI->hasStackHazardSlotIndex() &&
3930 HazardSlotIndex == std::numeric_limits<int>::max()) {
3931 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3932 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3933 << "\n");
3934 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3935 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3936 MinCSFrameIndex = HazardSlotIndex;
3937 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
3938 MaxCSFrameIndex = HazardSlotIndex;
3939 }
3940
3941 return true;
3942}
3943
3945 const MachineFunction &MF) const {
3947 // If the function has streaming-mode changes, don't scavenge a
3948 // spillslot in the callee-save area, as that might require an
3949 // 'addvl' in the streaming-mode-changing call-sequence when the
3950 // function doesn't use a FP.
3951 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
3952 return false;
3953 // Don't allow register salvaging with hazard slots, in case it moves objects
3954 // into the wrong place.
3955 if (AFI->hasStackHazardSlotIndex())
3956 return false;
3957 return AFI->hasCalleeSaveStackFreeSpace();
3958}
3959
3960/// returns true if there are any SVE callee saves.
3962 int &Min, int &Max) {
3963 Min = std::numeric_limits<int>::max();
3964 Max = std::numeric_limits<int>::min();
3965
3966 if (!MFI.isCalleeSavedInfoValid())
3967 return false;
3968
3969 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
3970 for (auto &CS : CSI) {
3971 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
3972 AArch64::PPRRegClass.contains(CS.getReg())) {
3973 assert((Max == std::numeric_limits<int>::min() ||
3974 Max + 1 == CS.getFrameIdx()) &&
3975 "SVE CalleeSaves are not consecutive");
3976
3977 Min = std::min(Min, CS.getFrameIdx());
3978 Max = std::max(Max, CS.getFrameIdx());
3979 }
3980 }
3981 return Min != std::numeric_limits<int>::max();
3982}
3983
3984// Process all the SVE stack objects and determine offsets for each
3985// object. If AssignOffsets is true, the offsets get assigned.
3986// Fills in the first and last callee-saved frame indices into
3987// Min/MaxCSFrameIndex, respectively.
3988// Returns the size of the stack.
3990 int &MinCSFrameIndex,
3991 int &MaxCSFrameIndex,
3992 bool AssignOffsets) {
3993#ifndef NDEBUG
3994 // First process all fixed stack objects.
3995 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
3997 "SVE vectors should never be passed on the stack by value, only by "
3998 "reference.");
3999#endif
4000
4001 auto Assign = [&MFI](int FI, int64_t Offset) {
4002 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
4003 MFI.setObjectOffset(FI, Offset);
4004 };
4005
4006 int64_t Offset = 0;
4007
4008 // Then process all callee saved slots.
4009 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
4010 // Assign offsets to the callee save slots.
4011 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
4012 Offset += MFI.getObjectSize(I);
4014 if (AssignOffsets)
4015 Assign(I, -Offset);
4016 }
4017 }
4018
4019 // Ensure that the Callee-save area is aligned to 16bytes.
4020 Offset = alignTo(Offset, Align(16U));
4021
4022 // Create a buffer of SVE objects to allocate and sort it.
4023 SmallVector<int, 8> ObjectsToAllocate;
4024 // If we have a stack protector, and we've previously decided that we have SVE
4025 // objects on the stack and thus need it to go in the SVE stack area, then it
4026 // needs to go first.
4027 int StackProtectorFI = -1;
4028 if (MFI.hasStackProtectorIndex()) {
4029 StackProtectorFI = MFI.getStackProtectorIndex();
4030 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
4031 ObjectsToAllocate.push_back(StackProtectorFI);
4032 }
4033 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
4034 unsigned StackID = MFI.getStackID(I);
4035 if (StackID != TargetStackID::ScalableVector)
4036 continue;
4037 if (I == StackProtectorFI)
4038 continue;
4039 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
4040 continue;
4041 if (MFI.isDeadObjectIndex(I))
4042 continue;
4043
4044 ObjectsToAllocate.push_back(I);
4045 }
4046
4047 // Allocate all SVE locals and spills
4048 for (unsigned FI : ObjectsToAllocate) {
4049 Align Alignment = MFI.getObjectAlign(FI);
4050 // FIXME: Given that the length of SVE vectors is not necessarily a power of
4051 // two, we'd need to align every object dynamically at runtime if the
4052 // alignment is larger than 16. This is not yet supported.
4053 if (Alignment > Align(16))
4055 "Alignment of scalable vectors > 16 bytes is not yet supported");
4056
4057 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
4058 if (AssignOffsets)
4059 Assign(FI, -Offset);
4060 }
4061
4062 return Offset;
4063}
4064
4065int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
4066 MachineFrameInfo &MFI) const {
4067 int MinCSFrameIndex, MaxCSFrameIndex;
4068 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
4069}
4070
4071int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
4072 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
4073 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
4074 true);
4075}
4076
4078 MachineFunction &MF, RegScavenger *RS) const {
4079 MachineFrameInfo &MFI = MF.getFrameInfo();
4080
4082 "Upwards growing stack unsupported");
4083
4084 int MinCSFrameIndex, MaxCSFrameIndex;
4085 int64_t SVEStackSize =
4086 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
4087
4089 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
4090 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
4091
4092 // If this function isn't doing Win64-style C++ EH, we don't need to do
4093 // anything.
4094 if (!MF.hasEHFunclets())
4095 return;
4097 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
4098
4099 MachineBasicBlock &MBB = MF.front();
4100 auto MBBI = MBB.begin();
4101 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
4102 ++MBBI;
4103
4104 // Create an UnwindHelp object.
4105 // The UnwindHelp object is allocated at the start of the fixed object area
4106 int64_t FixedObject =
4107 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
4108 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
4109 /*SPOffset*/ -FixedObject,
4110 /*IsImmutable=*/false);
4111 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
4112
4113 // We need to store -2 into the UnwindHelp object at the start of the
4114 // function.
4115 DebugLoc DL;
4117 RS->backward(MBBI);
4118 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
4119 assert(DstReg && "There must be a free register after frame setup");
4120 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
4121 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
4122 .addReg(DstReg, getKillRegState(true))
4123 .addFrameIndex(UnwindHelpFI)
4124 .addImm(0);
4125}
4126
4127namespace {
4128struct TagStoreInstr {
4130 int64_t Offset, Size;
4131 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
4132 : MI(MI), Offset(Offset), Size(Size) {}
4133};
4134
4135class TagStoreEdit {
4136 MachineFunction *MF;
4139 // Tag store instructions that are being replaced.
4141 // Combined memref arguments of the above instructions.
4143
4144 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
4145 // FrameRegOffset + Size) with the address tag of SP.
4146 Register FrameReg;
4147 StackOffset FrameRegOffset;
4148 int64_t Size;
4149 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
4150 // end.
4151 std::optional<int64_t> FrameRegUpdate;
4152 // MIFlags for any FrameReg updating instructions.
4153 unsigned FrameRegUpdateFlags;
4154
4155 // Use zeroing instruction variants.
4156 bool ZeroData;
4157 DebugLoc DL;
4158
4159 void emitUnrolled(MachineBasicBlock::iterator InsertI);
4160 void emitLoop(MachineBasicBlock::iterator InsertI);
4161
4162public:
4163 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
4164 : MBB(MBB), ZeroData(ZeroData) {
4165 MF = MBB->getParent();
4166 MRI = &MF->getRegInfo();
4167 }
4168 // Add an instruction to be replaced. Instructions must be added in the
4169 // ascending order of Offset, and have to be adjacent.
4170 void addInstruction(TagStoreInstr I) {
4171 assert((TagStores.empty() ||
4172 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
4173 "Non-adjacent tag store instructions.");
4174 TagStores.push_back(I);
4175 }
4176 void clear() { TagStores.clear(); }
4177 // Emit equivalent code at the given location, and erase the current set of
4178 // instructions. May skip if the replacement is not profitable. May invalidate
4179 // the input iterator and replace it with a valid one.
4180 void emitCode(MachineBasicBlock::iterator &InsertI,
4181 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
4182};
4183
4184void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
4185 const AArch64InstrInfo *TII =
4186 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4187
4188 const int64_t kMinOffset = -256 * 16;
4189 const int64_t kMaxOffset = 255 * 16;
4190
4191 Register BaseReg = FrameReg;
4192 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
4193 if (BaseRegOffsetBytes < kMinOffset ||
4194 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
4195 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
4196 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
4197 // is required for the offset of ST2G.
4198 BaseRegOffsetBytes % 16 != 0) {
4199 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4200 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
4201 StackOffset::getFixed(BaseRegOffsetBytes), TII);
4202 BaseReg = ScratchReg;
4203 BaseRegOffsetBytes = 0;
4204 }
4205
4206 MachineInstr *LastI = nullptr;
4207 while (Size) {
4208 int64_t InstrSize = (Size > 16) ? 32 : 16;
4209 unsigned Opcode =
4210 InstrSize == 16
4211 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
4212 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
4213 assert(BaseRegOffsetBytes % 16 == 0);
4214 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
4215 .addReg(AArch64::SP)
4216 .addReg(BaseReg)
4217 .addImm(BaseRegOffsetBytes / 16)
4218 .setMemRefs(CombinedMemRefs);
4219 // A store to [BaseReg, #0] should go last for an opportunity to fold the
4220 // final SP adjustment in the epilogue.
4221 if (BaseRegOffsetBytes == 0)
4222 LastI = I;
4223 BaseRegOffsetBytes += InstrSize;
4224 Size -= InstrSize;
4225 }
4226
4227 if (LastI)
4228 MBB->splice(InsertI, MBB, LastI);
4229}
4230
4231void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
4232 const AArch64InstrInfo *TII =
4233 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4234
4235 Register BaseReg = FrameRegUpdate
4236 ? FrameReg
4237 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4238 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4239
4240 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
4241
4242 int64_t LoopSize = Size;
4243 // If the loop size is not a multiple of 32, split off one 16-byte store at
4244 // the end to fold BaseReg update into.
4245 if (FrameRegUpdate && *FrameRegUpdate)
4246 LoopSize -= LoopSize % 32;
4247 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
4248 TII->get(ZeroData ? AArch64::STZGloop_wback
4249 : AArch64::STGloop_wback))
4250 .addDef(SizeReg)
4251 .addDef(BaseReg)
4252 .addImm(LoopSize)
4253 .addReg(BaseReg)
4254 .setMemRefs(CombinedMemRefs);
4255 if (FrameRegUpdate)
4256 LoopI->setFlags(FrameRegUpdateFlags);
4257
4258 int64_t ExtraBaseRegUpdate =
4259 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
4260 if (LoopSize < Size) {
4261 assert(FrameRegUpdate);
4262 assert(Size - LoopSize == 16);
4263 // Tag 16 more bytes at BaseReg and update BaseReg.
4264 BuildMI(*MBB, InsertI, DL,
4265 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
4266 .addDef(BaseReg)
4267 .addReg(BaseReg)
4268 .addReg(BaseReg)
4269 .addImm(1 + ExtraBaseRegUpdate / 16)
4270 .setMemRefs(CombinedMemRefs)
4271 .setMIFlags(FrameRegUpdateFlags);
4272 } else if (ExtraBaseRegUpdate) {
4273 // Update BaseReg.
4274 BuildMI(
4275 *MBB, InsertI, DL,
4276 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
4277 .addDef(BaseReg)
4278 .addReg(BaseReg)
4279 .addImm(std::abs(ExtraBaseRegUpdate))
4280 .addImm(0)
4281 .setMIFlags(FrameRegUpdateFlags);
4282 }
4283}
4284
4285// Check if *II is a register update that can be merged into STGloop that ends
4286// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
4287// end of the loop.
4288bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
4289 int64_t Size, int64_t *TotalOffset) {
4290 MachineInstr &MI = *II;
4291 if ((MI.getOpcode() == AArch64::ADDXri ||
4292 MI.getOpcode() == AArch64::SUBXri) &&
4293 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
4294 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
4295 int64_t Offset = MI.getOperand(2).getImm() << Shift;
4296 if (MI.getOpcode() == AArch64::SUBXri)
4297 Offset = -Offset;
4298 int64_t AbsPostOffset = std::abs(Offset - Size);
4299 const int64_t kMaxOffset =
4300 0xFFF; // Max encoding for unshifted ADDXri / SUBXri
4301 if (AbsPostOffset <= kMaxOffset && AbsPostOffset % 16 == 0) {
4302 *TotalOffset = Offset;
4303 return true;
4304 }
4305 }
4306 return false;
4307}
4308
4309void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
4311 MemRefs.clear();
4312 for (auto &TS : TSE) {
4313 MachineInstr *MI = TS.MI;
4314 // An instruction without memory operands may access anything. Be
4315 // conservative and return an empty list.
4316 if (MI->memoperands_empty()) {
4317 MemRefs.clear();
4318 return;
4319 }
4320 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
4321 }
4322}
4323
4324void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
4325 const AArch64FrameLowering *TFI,
4326 bool TryMergeSPUpdate) {
4327 if (TagStores.empty())
4328 return;
4329 TagStoreInstr &FirstTagStore = TagStores[0];
4330 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
4331 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
4332 DL = TagStores[0].MI->getDebugLoc();
4333
4334 Register Reg;
4335 FrameRegOffset = TFI->resolveFrameOffsetReference(
4336 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
4337 /*PreferFP=*/false, /*ForSimm=*/true);
4338 FrameReg = Reg;
4339 FrameRegUpdate = std::nullopt;
4340
4341 mergeMemRefs(TagStores, CombinedMemRefs);
4342
4343 LLVM_DEBUG({
4344 dbgs() << "Replacing adjacent STG instructions:\n";
4345 for (const auto &Instr : TagStores) {
4346 dbgs() << " " << *Instr.MI;
4347 }
4348 });
4349
4350 // Size threshold where a loop becomes shorter than a linear sequence of
4351 // tagging instructions.
4352 const int kSetTagLoopThreshold = 176;
4353 if (Size < kSetTagLoopThreshold) {
4354 if (TagStores.size() < 2)
4355 return;
4356 emitUnrolled(InsertI);
4357 } else {
4358 MachineInstr *UpdateInstr = nullptr;
4359 int64_t TotalOffset = 0;
4360 if (TryMergeSPUpdate) {
4361 // See if we can merge base register update into the STGloop.
4362 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
4363 // but STGloop is way too unusual for that, and also it only
4364 // realistically happens in function epilogue. Also, STGloop is expanded
4365 // before that pass.
4366 if (InsertI != MBB->end() &&
4367 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
4368 &TotalOffset)) {
4369 UpdateInstr = &*InsertI++;
4370 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
4371 << *UpdateInstr);
4372 }
4373 }
4374
4375 if (!UpdateInstr && TagStores.size() < 2)
4376 return;
4377
4378 if (UpdateInstr) {
4379 FrameRegUpdate = TotalOffset;
4380 FrameRegUpdateFlags = UpdateInstr->getFlags();
4381 }
4382 emitLoop(InsertI);
4383 if (UpdateInstr)
4384 UpdateInstr->eraseFromParent();
4385 }
4386
4387 for (auto &TS : TagStores)
4388 TS.MI->eraseFromParent();
4389}
4390
4391bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
4392 int64_t &Size, bool &ZeroData) {
4393 MachineFunction &MF = *MI.getParent()->getParent();
4394 const MachineFrameInfo &MFI = MF.getFrameInfo();
4395
4396 unsigned Opcode = MI.getOpcode();
4397 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
4398 Opcode == AArch64::STZ2Gi);
4399
4400 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
4401 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
4402 return false;
4403 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
4404 return false;
4405 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
4406 Size = MI.getOperand(2).getImm();
4407 return true;
4408 }
4409
4410 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
4411 Size = 16;
4412 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
4413 Size = 32;
4414 else
4415 return false;
4416
4417 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
4418 return false;
4419
4420 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
4421 16 * MI.getOperand(2).getImm();
4422 return true;
4423}
4424
4425// Detect a run of memory tagging instructions for adjacent stack frame slots,
4426// and replace them with a shorter instruction sequence:
4427// * replace STG + STG with ST2G
4428// * replace STGloop + STGloop with STGloop
4429// This code needs to run when stack slot offsets are already known, but before
4430// FrameIndex operands in STG instructions are eliminated.
4432 const AArch64FrameLowering *TFI,
4433 RegScavenger *RS) {
4434 bool FirstZeroData;
4435 int64_t Size, Offset;
4436 MachineInstr &MI = *II;
4437 MachineBasicBlock *MBB = MI.getParent();
4439 if (&MI == &MBB->instr_back())
4440 return II;
4441 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
4442 return II;
4443
4445 Instrs.emplace_back(&MI, Offset, Size);
4446
4447 constexpr int kScanLimit = 10;
4448 int Count = 0;
4450 NextI != E && Count < kScanLimit; ++NextI) {
4451 MachineInstr &MI = *NextI;
4452 bool ZeroData;
4453 int64_t Size, Offset;
4454 // Collect instructions that update memory tags with a FrameIndex operand
4455 // and (when applicable) constant size, and whose output registers are dead
4456 // (the latter is almost always the case in practice). Since these
4457 // instructions effectively have no inputs or outputs, we are free to skip
4458 // any non-aliasing instructions in between without tracking used registers.
4459 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
4460 if (ZeroData != FirstZeroData)
4461 break;
4462 Instrs.emplace_back(&MI, Offset, Size);
4463 continue;
4464 }
4465
4466 // Only count non-transient, non-tagging instructions toward the scan
4467 // limit.
4468 if (!MI.isTransient())
4469 ++Count;
4470
4471 // Just in case, stop before the epilogue code starts.
4472 if (MI.getFlag(MachineInstr::FrameSetup) ||
4474 break;
4475
4476 // Reject anything that may alias the collected instructions.
4477 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
4478 break;
4479 }
4480
4481 // New code will be inserted after the last tagging instruction we've found.
4482 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
4483
4484 // All the gathered stack tag instructions are merged and placed after
4485 // last tag store in the list. The check should be made if the nzcv
4486 // flag is live at the point where we are trying to insert. Otherwise
4487 // the nzcv flag might get clobbered if any stg loops are present.
4488
4489 // FIXME : This approach of bailing out from merge is conservative in
4490 // some ways like even if stg loops are not present after merge the
4491 // insert list, this liveness check is done (which is not needed).
4493 LiveRegs.addLiveOuts(*MBB);
4494 for (auto I = MBB->rbegin();; ++I) {
4495 MachineInstr &MI = *I;
4496 if (MI == InsertI)
4497 break;
4498 LiveRegs.stepBackward(*I);
4499 }
4500 InsertI++;
4501 if (LiveRegs.contains(AArch64::NZCV))
4502 return InsertI;
4503
4504 llvm::stable_sort(Instrs,
4505 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
4506 return Left.Offset < Right.Offset;
4507 });
4508
4509 // Make sure that we don't have any overlapping stores.
4510 int64_t CurOffset = Instrs[0].Offset;
4511 for (auto &Instr : Instrs) {
4512 if (CurOffset > Instr.Offset)
4513 return NextI;
4514 CurOffset = Instr.Offset + Instr.Size;
4515 }
4516
4517 // Find contiguous runs of tagged memory and emit shorter instruction
4518 // sequencies for them when possible.
4519 TagStoreEdit TSE(MBB, FirstZeroData);
4520 std::optional<int64_t> EndOffset;
4521 for (auto &Instr : Instrs) {
4522 if (EndOffset && *EndOffset != Instr.Offset) {
4523 // Found a gap.
4524 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
4525 TSE.clear();
4526 }
4527
4528 TSE.addInstruction(Instr);
4529 EndOffset = Instr.Offset + Instr.Size;
4530 }
4531
4532 const MachineFunction *MF = MBB->getParent();
4533 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
4534 TSE.emitCode(
4535 InsertI, TFI, /*TryMergeSPUpdate = */
4537
4538 return InsertI;
4539}
4540} // namespace
4541
4543 const AArch64FrameLowering *TFI) {
4544 MachineInstr &MI = *II;
4545 MachineBasicBlock *MBB = MI.getParent();
4546 MachineFunction *MF = MBB->getParent();
4547
4548 if (MI.getOpcode() != AArch64::VGSavePseudo &&
4549 MI.getOpcode() != AArch64::VGRestorePseudo)
4550 return II;
4551
4552 SMEAttrs FuncAttrs(MF->getFunction());
4553 bool LocallyStreaming =
4554 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
4557 const AArch64InstrInfo *TII =
4558 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4559
4560 int64_t VGFrameIdx =
4561 LocallyStreaming ? AFI->getStreamingVGIdx() : AFI->getVGIdx();
4562 assert(VGFrameIdx != std::numeric_limits<int>::max() &&
4563 "Expected FrameIdx for VG");
4564
4565 unsigned CFIIndex;
4566 if (MI.getOpcode() == AArch64::VGSavePseudo) {
4567 const MachineFrameInfo &MFI = MF->getFrameInfo();
4568 int64_t Offset =
4569 MFI.getObjectOffset(VGFrameIdx) - TFI->getOffsetOfLocalArea();
4571 nullptr, TRI->getDwarfRegNum(AArch64::VG, true), Offset));
4572 } else
4574 nullptr, TRI->getDwarfRegNum(AArch64::VG, true)));
4575
4576 MachineInstr *UnwindInst = BuildMI(*MBB, II, II->getDebugLoc(),
4577 TII->get(TargetOpcode::CFI_INSTRUCTION))
4578 .addCFIIndex(CFIIndex);
4579
4580 MI.eraseFromParent();
4581 return UnwindInst->getIterator();
4582}
4583
4585 MachineFunction &MF, RegScavenger *RS = nullptr) const {
4587 for (auto &BB : MF)
4588 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
4589 if (AFI->hasStreamingModeChanges())
4590 II = emitVGSaveRestore(II, this);
4592 II = tryMergeAdjacentSTG(II, this, RS);
4593 }
4594}
4595
4596/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
4597/// before the update. This is easily retrieved as it is exactly the offset
4598/// that is set in processFunctionBeforeFrameFinalized.
4600 const MachineFunction &MF, int FI, Register &FrameReg,
4601 bool IgnoreSPUpdates) const {
4602 const MachineFrameInfo &MFI = MF.getFrameInfo();
4603 if (IgnoreSPUpdates) {
4604 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
4605 << MFI.getObjectOffset(FI) << "\n");
4606 FrameReg = AArch64::SP;
4607 return StackOffset::getFixed(MFI.getObjectOffset(FI));
4608 }
4609
4610 // Go to common code if we cannot provide sp + offset.
4611 if (MFI.hasVarSizedObjects() ||
4614 return getFrameIndexReference(MF, FI, FrameReg);
4615
4616 FrameReg = AArch64::SP;
4617 return getStackOffset(MF, MFI.getObjectOffset(FI));
4618}
4619
4620/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
4621/// the parent's frame pointer
4623 const MachineFunction &MF) const {
4624 return 0;
4625}
4626
4627/// Funclets only need to account for space for the callee saved registers,
4628/// as the locals are accounted for in the parent's stack frame.
4630 const MachineFunction &MF) const {
4631 // This is the size of the pushed CSRs.
4632 unsigned CSSize =
4633 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
4634 // This is the amount of stack a funclet needs to allocate.
4635 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
4636 getStackAlign());
4637}
4638
4639namespace {
4640struct FrameObject {
4641 bool IsValid = false;
4642 // Index of the object in MFI.
4643 int ObjectIndex = 0;
4644 // Group ID this object belongs to.
4645 int GroupIndex = -1;
4646 // This object should be placed first (closest to SP).
4647 bool ObjectFirst = false;
4648 // This object's group (which always contains the object with
4649 // ObjectFirst==true) should be placed first.
4650 bool GroupFirst = false;
4651
4652 // Used to distinguish between FP and GPR accesses. The values are decided so
4653 // that they sort FPR < Hazard < GPR and they can be or'd together.
4654 unsigned Accesses = 0;
4655 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
4656};
4657
4658class GroupBuilder {
4659 SmallVector<int, 8> CurrentMembers;
4660 int NextGroupIndex = 0;
4661 std::vector<FrameObject> &Objects;
4662
4663public:
4664 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
4665 void AddMember(int Index) { CurrentMembers.push_back(Index); }
4666 void EndCurrentGroup() {
4667 if (CurrentMembers.size() > 1) {
4668 // Create a new group with the current member list. This might remove them
4669 // from their pre-existing groups. That's OK, dealing with overlapping
4670 // groups is too hard and unlikely to make a difference.
4671 LLVM_DEBUG(dbgs() << "group:");
4672 for (int Index : CurrentMembers) {
4673 Objects[Index].GroupIndex = NextGroupIndex;
4674 LLVM_DEBUG(dbgs() << " " << Index);
4675 }
4676 LLVM_DEBUG(dbgs() << "\n");
4677 NextGroupIndex++;
4678 }
4679 CurrentMembers.clear();
4680 }
4681};
4682
4683bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
4684 // Objects at a lower index are closer to FP; objects at a higher index are
4685 // closer to SP.
4686 //
4687 // For consistency in our comparison, all invalid objects are placed
4688 // at the end. This also allows us to stop walking when we hit the
4689 // first invalid item after it's all sorted.
4690 //
4691 // If we want to include a stack hazard region, order FPR accesses < the
4692 // hazard object < GPRs accesses in order to create a separation between the
4693 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
4694 //
4695 // Otherwise the "first" object goes first (closest to SP), followed by the
4696 // members of the "first" group.
4697 //
4698 // The rest are sorted by the group index to keep the groups together.
4699 // Higher numbered groups are more likely to be around longer (i.e. untagged
4700 // in the function epilogue and not at some earlier point). Place them closer
4701 // to SP.
4702 //
4703 // If all else equal, sort by the object index to keep the objects in the
4704 // original order.
4705 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
4706 A.GroupIndex, A.ObjectIndex) <
4707 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
4708 B.GroupIndex, B.ObjectIndex);
4709}
4710} // namespace
4711
4713 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
4714 if (!OrderFrameObjects || ObjectsToAllocate.empty())
4715 return;
4716
4718 const MachineFrameInfo &MFI = MF.getFrameInfo();
4719 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
4720 for (auto &Obj : ObjectsToAllocate) {
4721 FrameObjects[Obj].IsValid = true;
4722 FrameObjects[Obj].ObjectIndex = Obj;
4723 }
4724
4725 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
4726 // the same time.
4727 GroupBuilder GB(FrameObjects);
4728 for (auto &MBB : MF) {
4729 for (auto &MI : MBB) {
4730 if (MI.isDebugInstr())
4731 continue;
4732
4733 if (AFI.hasStackHazardSlotIndex()) {
4734 std::optional<int> FI = getLdStFrameID(MI, MFI);
4735 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
4736 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
4738 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
4739 else
4740 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
4741 }
4742 }
4743
4744 int OpIndex;
4745 switch (MI.getOpcode()) {
4746 case AArch64::STGloop:
4747 case AArch64::STZGloop:
4748 OpIndex = 3;
4749 break;
4750 case AArch64::STGi:
4751 case AArch64::STZGi:
4752 case AArch64::ST2Gi:
4753 case AArch64::STZ2Gi:
4754 OpIndex = 1;
4755 break;
4756 default:
4757 OpIndex = -1;
4758 }
4759
4760 int TaggedFI = -1;
4761 if (OpIndex >= 0) {
4762 const MachineOperand &MO = MI.getOperand(OpIndex);
4763 if (MO.isFI()) {
4764 int FI = MO.getIndex();
4765 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
4766 FrameObjects[FI].IsValid)
4767 TaggedFI = FI;
4768 }
4769 }
4770
4771 // If this is a stack tagging instruction for a slot that is not part of a
4772 // group yet, either start a new group or add it to the current one.
4773 if (TaggedFI >= 0)
4774 GB.AddMember(TaggedFI);
4775 else
4776 GB.EndCurrentGroup();
4777 }
4778 // Groups should never span multiple basic blocks.
4779 GB.EndCurrentGroup();
4780 }
4781
4782 if (AFI.hasStackHazardSlotIndex()) {
4783 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
4784 FrameObject::AccessHazard;
4785 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
4786 for (auto &Obj : FrameObjects)
4787 if (!Obj.Accesses ||
4788 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
4789 Obj.Accesses = FrameObject::AccessGPR;
4790 }
4791
4792 // If the function's tagged base pointer is pinned to a stack slot, we want to
4793 // put that slot first when possible. This will likely place it at SP + 0,
4794 // and save one instruction when generating the base pointer because IRG does
4795 // not allow an immediate offset.
4796 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
4797 if (TBPI) {
4798 FrameObjects[*TBPI].ObjectFirst = true;
4799 FrameObjects[*TBPI].GroupFirst = true;
4800 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
4801 if (FirstGroupIndex >= 0)
4802 for (FrameObject &Object : FrameObjects)
4803 if (Object.GroupIndex == FirstGroupIndex)
4804 Object.GroupFirst = true;
4805 }
4806
4807 llvm::stable_sort(FrameObjects, FrameObjectCompare);
4808
4809 int i = 0;
4810 for (auto &Obj : FrameObjects) {
4811 // All invalid items are sorted at the end, so it's safe to stop.
4812 if (!Obj.IsValid)
4813 break;
4814 ObjectsToAllocate[i++] = Obj.ObjectIndex;
4815 }
4816
4817 LLVM_DEBUG({
4818 dbgs() << "Final frame order:\n";
4819 for (auto &Obj : FrameObjects) {
4820 if (!Obj.IsValid)
4821 break;
4822 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
4823 if (Obj.ObjectFirst)
4824 dbgs() << ", first";
4825 if (Obj.GroupFirst)
4826 dbgs() << ", group-first";
4827 dbgs() << "\n";
4828 }
4829 });
4830}
4831
4832/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
4833/// least every ProbeSize bytes. Returns an iterator of the first instruction
4834/// after the loop. The difference between SP and TargetReg must be an exact
4835/// multiple of ProbeSize.
4837AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
4838 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
4839 Register TargetReg) const {
4841 MachineFunction &MF = *MBB.getParent();
4842 const AArch64InstrInfo *TII =
4843 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4845
4846 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
4848 MF.insert(MBBInsertPoint, LoopMBB);
4850 MF.insert(MBBInsertPoint, ExitMBB);
4851
4852 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
4853 // in SUB).
4854 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
4855 StackOffset::getFixed(-ProbeSize), TII,
4857 // STR XZR, [SP]
4858 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
4859 .addReg(AArch64::XZR)
4860 .addReg(AArch64::SP)
4861 .addImm(0)
4863 // CMP SP, TargetReg
4864 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
4865 AArch64::XZR)
4866 .addReg(AArch64::SP)
4867 .addReg(TargetReg)
4870 // B.CC Loop
4871 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
4873 .addMBB(LoopMBB)
4875
4876 LoopMBB->addSuccessor(ExitMBB);
4877 LoopMBB->addSuccessor(LoopMBB);
4878 // Synthesize the exit MBB.
4879 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
4881 MBB.addSuccessor(LoopMBB);
4882 // Update liveins.
4883 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
4884
4885 return ExitMBB->begin();
4886}
4887
4888void AArch64FrameLowering::inlineStackProbeFixed(
4889 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
4890 StackOffset CFAOffset) const {
4892 MachineFunction &MF = *MBB->getParent();
4893 const AArch64InstrInfo *TII =
4894 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4896 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
4897 bool HasFP = hasFP(MF);
4898
4899 DebugLoc DL;
4900 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
4901 int64_t NumBlocks = FrameSize / ProbeSize;
4902 int64_t ResidualSize = FrameSize % ProbeSize;
4903
4904 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
4905 << NumBlocks << " blocks of " << ProbeSize
4906 << " bytes, plus " << ResidualSize << " bytes\n");
4907
4908 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
4909 // ordinary loop.
4910 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
4911 for (int i = 0; i < NumBlocks; ++i) {
4912 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
4913 // encodable in a SUB).
4914 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4915 StackOffset::getFixed(-ProbeSize), TII,
4916 MachineInstr::FrameSetup, false, false, nullptr,
4917 EmitAsyncCFI && !HasFP, CFAOffset);
4918 CFAOffset += StackOffset::getFixed(ProbeSize);
4919 // STR XZR, [SP]
4920 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4921 .addReg(AArch64::XZR)
4922 .addReg(AArch64::SP)
4923 .addImm(0)
4925 }
4926 } else if (NumBlocks != 0) {
4927 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
4928 // encodable in ADD). ScrathReg may temporarily become the CFA register.
4929 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
4930 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
4931 MachineInstr::FrameSetup, false, false, nullptr,
4932 EmitAsyncCFI && !HasFP, CFAOffset);
4933 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
4934 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
4935 MBB = MBBI->getParent();
4936 if (EmitAsyncCFI && !HasFP) {
4937 // Set the CFA register back to SP.
4939 *MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
4940 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
4941 unsigned CFIIndex =
4943 BuildMI(*MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
4944 .addCFIIndex(CFIIndex)
4946 }
4947 }
4948
4949 if (ResidualSize != 0) {
4950 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
4951 // in SUB).
4952 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4953 StackOffset::getFixed(-ResidualSize), TII,
4954 MachineInstr::FrameSetup, false, false, nullptr,
4955 EmitAsyncCFI && !HasFP, CFAOffset);
4956 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
4957 // STR XZR, [SP]
4958 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4959 .addReg(AArch64::XZR)
4960 .addReg(AArch64::SP)
4961 .addImm(0)
4963 }
4964 }
4965}
4966
4967void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
4968 MachineBasicBlock &MBB) const {
4969 // Get the instructions that need to be replaced. We emit at most two of
4970 // these. Remember them in order to avoid complications coming from the need
4971 // to traverse the block while potentially creating more blocks.
4973 for (MachineInstr &MI : MBB)
4974 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
4975 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
4976 ToReplace.push_back(&MI);
4977
4978 for (MachineInstr *MI : ToReplace) {
4979 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
4980 Register ScratchReg = MI->getOperand(0).getReg();
4981 int64_t FrameSize = MI->getOperand(1).getImm();
4982 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
4983 MI->getOperand(3).getImm());
4984 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
4985 CFAOffset);
4986 } else {
4987 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
4988 "Stack probe pseudo-instruction expected");
4989 const AArch64InstrInfo *TII =
4990 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
4991 Register TargetReg = MI->getOperand(0).getReg();
4992 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
4993 }
4994 MI->eraseFromParent();
4995 }
4996}
unsigned const MachineRegisterInfo * MRI
#define Success
for(const MachineOperand &MO :llvm::drop_begin(OldMI.operands(), Desc.getNumOperands()))
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned FixedObject)
static bool needsWinCFI(const MachineFunction &MF)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
static cl::opt< unsigned > StackHazardSize("aarch64-stack-hazard-size", cl::init(0), cl::Hidden)
bool requiresGetVGCall(MachineFunction &MF)
bool isVGInstruction(MachineBasicBlock::iterator MBBI)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static Register findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
static void getLivePhysRegsUpTo(MachineInstr &MI, const TargetRegisterInfo &TRI, LivePhysRegs &LiveRegs)
Collect live registers from the end of MI's parent up to (including) MI in LiveRegs.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
MachineBasicBlock::iterator emitVGSaveRestore(MachineBasicBlock::iterator II, const AArch64FrameLowering *TFI)
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
unsigned RegSize
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
static void clear(coro::Shape &Shape)
Definition: Coroutines.cpp:148
#define LLVM_DEBUG(X)
Definition: Debug.h:101
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
static unsigned getReg(const MCDisassembler *D, unsigned RC, unsigned RegNo)
uint64_t IntrinsicInst * II
if(VerifyEach)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:167
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:469
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFP(const MachineFunction &MF) const override
hasFP - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
bool enableCFIFixup(MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setPredicateRegForFillSpill(unsigned Reg)
void setStreamingVGIdx(unsigned FrameIdx)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
const Triple & getTargetTriple() const
const char * getChkStkName() const
bool isCallingConvWin64(CallingConv::ID CC, bool IsVarArg) const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
bool hasAttrSomewhere(Attribute::AttrKind Kind, unsigned *Index=nullptr) const
Return true if the specified attribute is set for at least one parameter or for the return value.
bool test(unsigned Idx) const
Definition: BitVector.h:461
BitVector & reset()
Definition: BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:698
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:695
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:274
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:350
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition: Function.h:225
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:719
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc) const override
Emit instructions to copy a pair of physical registers.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:52
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void stepBackward(const MachineInstr &MI)
Simulates liveness when stepping backwards over an instruction(bundle).
void removeReg(MCPhysReg Reg)
Removes a physical register, all its sub-registers, and all its super-registers from the set.
Definition: LivePhysRegs.h:92
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds all live-out registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:83
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:793
static MCCFIInstruction createDefCfaRegister(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_def_cfa_register modifies a rule for computing CFA.
Definition: MCDwarf.h:565
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:633
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int64_t Offset, SMLoc Loc={})
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:558
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int64_t Offset, SMLoc Loc={})
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:600
static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:626
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int64_t Offset, SMLoc Loc={})
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:573
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, SMLoc Loc={}, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:664
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:647
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:346
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:41
void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
instr_iterator instr_begin()
iterator_range< livein_iterator > liveins() const
const BasicBlock * getBasicBlock() const
Return the LLVM basic block that this instance corresponded to originally.
bool isLiveIn(MCPhysReg Reg, LaneBitmask LaneMask=LaneBitmask::getAll()) const
Return true if the specified register is in the live in set.
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
instr_iterator instr_end()
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
uint8_t getStackID(int ObjectIdx) const
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MCContext & getContext() const
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
const LLVMTargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:69
void setFlags(unsigned flags)
Definition: MachineInstr.h:409
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:391
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:307
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasStreamingBody() const
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:586
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:950
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:696
void push_back(const T &Elt)
Definition: SmallVector.h:426
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1209
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:33
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:49
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:52
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:44
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:43
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:42
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:50
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetRegisterInfo * getRegisterInfo() const
getRegisterInfo - If register information is available, return it.
virtual const TargetInstrInfo * getInstrInfo() const
StringRef getArchName() const
Get the architecture (first) component of the triple.
Definition: Triple.cpp:1302
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:345
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
self_iterator getIterator()
Definition: ilist_node.h:132
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ MO_GOT
MO_GOT - This flag indicates that a symbol operand represents the address of the GOT entry for the sy...
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
static unsigned getShifterImm(AArch64_AM::ShiftExtendType ST, unsigned Imm)
getShifterImm - Encode the shift type and amount: imm: 6-bit shift amount shifter: 000 ==> lsl 001 ==...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
Definition: CallingConv.h:224
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition: CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition: CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition: CallingConv.h:50
@ AArch64_SME_ABI_Support_Routines_PreserveMost_From_X1
Preserve X1-X15, X19-X29, SP, Z0-Z31, P0-P15.
Definition: CallingConv.h:271
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition: CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition: CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
Definition: CallingConv.h:159
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition: CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Dead
Unused definition.
@ Define
Register definition.
@ Kill
The last use of a register.
@ Undef
Value of the register doesn't matter.
Reg
All possible values of the reg field in the ModR/M byte.
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:443
NodeAddr< InstrNode * > Instr
Definition: RDFGraph.h:389
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:480
void stable_sort(R &&Range)
Definition: STLExtras.h:1995
MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg, unsigned Reg, const StackOffset &Offset, bool LastAdjustmentWasScalable=true)
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition: ScopeExit.h:59
MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg, const StackOffset &OffsetFromDefCFA)
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
unsigned getBLRCallOpcode(const MachineFunction &MF)
Return opcode to be used for indirect calls.
const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=6)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1729
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:419
@ Always
Always set the bit.
@ DeploymentBased
Determine whether to set the bit statically or dynamically based on the deployment target.
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:167
EHPersonality classifyEHPersonality(const Value *Pers)
See if the given exception handling personality function is one that we understand.
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
bool isAsynchronousEHPersonality(EHPersonality Pers)
Returns true if this personality function catches asynchronous exceptions.
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
Definition: LivePhysRegs.h:215
Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition: BitVector.h:860
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
Description of the encoding of one expression Op.
Pair of physical register and lane mask.
static MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.